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Appendix 


Preface to the Third Edition 


The Wiley editions, 1987 and 1997, and the Russian editions, 1999 
and 2012, of the book received reasonable attention. Apart from 
some 25 reviews in scientific journals and several citations, the 
book was used effectively for courses and seminars in probability 
and also for comprehensive PhD exams at universities in USA. I 
was invited to universities in Europe and America to deliver 
special lectures on counterexamples and their role in teaching and 
research. 

Leaving positive reactions aside, | was more concerned by 
letters and messages containing different sort of complaints. 

Some colleagues were angrily asking why I had not used 
counterexamples from their papers and had not included the papers 
in the References. I apologized to all of them promising to correct 
this in the next edition. 

Other colleagues were not happy about the Index and references 
for the examples because I had not provided the pages thus leaving 
the reader to do his/her own search in a book, journal, etc. Others, 
on the contrary, shared with me that exactly when ‘digging’ in 
books they not only found the desired details but also discovered, 
e.g., how wonderful the books by A. Renyi, K.L. Chung, A.N. 
Shiryaev, R Billingsley and L.C.G. Rogers & D. Williams are. 

Complaints came from readers who wanted to cite examples 
from the 2nd edition of my book in their papers, and needed to 
include the Mathematical Reviews data. But while the Ist edition 
was reviewed (see MR 89f:60001), strangely enough, the 2nd was 
not. Ask Mathematical Reviews to explain this mystery. 

I received letters and messages from readers, mainly students, 
who liked the book 1n many ways but were unable to pay Wiley a 
price of £200 for a copy on demand or to Amazon a price of $250 
and even more. I answered invariably: do not buy it, there will be a 
new edition with a reasonable price. 


Finally, here is an episode that happened during the Joint 
Annual Meeting of AMS and MAA, January 1997, San Diego, 
California. I was there with a fresh copy of the 2nd Wiley edition, 
just published. Among those who showed a keen interest to my 
book was John Grafton, Senior Editor at Dover Publications, New 
York. Returning the book to me after perusing it for half an hour 
he said: “One day I will publish this book at Dover!” That day 
arrived. 

Meanwhile, I accumulated a large amount of new material on 
old and new topics. In order to reflect these developments I 
prepared for the Dover edition a special Appendix containing key 
words followed by references. The main text of the 2nd Wiley 
edition was checked again, slightly revised, corrected and updated. 
As before, I followed the rule: ‘Always pick the lightest item that 
still fits in the bag!” 

Despite some difficult years in the past, I have been a lucky 
man! My unusual life started from Divotino/Pernik (Bulgaria) and 
my later studies, work and travels brought me via Sofia and 
Moscow to Montreal, Paris, New York, Tokyo, Rio de Janeiro, 
Taipei and London. In one or another way, I was inspired in my 
work and life by remarkable people. I will only mention here a few 
names: Ivan Tiufekchiev, Albert N. Shiryaev, Constance Van 
Eeden, Fortunato Pesarin, Alain Le Breton, Paulo Ribenboim, Bart 
Braden, Masaaki Kijima and Gwo Dong Lin. I would like to use 
this opportunity to express my deep gratitude to all of them for 
their invariable support and precious help. 

It would also be absolutely necessary to mention here the names 
of great contributors to Modern Probability. It was in the late 70s 
and beginning of 80s last century that I discussed this project with 
A.N. Kolmogorov (1903-1987), B. V. Gnedenko (1912-1995), 
D.G. Kendafl (1918-2007), K. Ito (1915-2008) and J.L. Doob 
(1910-2004). Their interest, suggestions and encouraging 
comments were more than stimulating for my work. What they 


expressed could be summarized briefly as follows: “It is nice to see 
that a young and enthusiastic mathematician from Bulgaria, 
graduate of Moscow State University, is determined to complete an 
ambitious and original project and publish a book which will be 
welcomed by everybody in the area of probability.” 

The material in this book is very diverse. However, the readers, 
as well as their needs and goals, are also different. Those who find 
intriguing facts and statements are expected to work out the 
necessary details in any particular case. It will enhance better 
understanding of the subject, keep their eyes open and prepare 
their minds for future challenges. 

The pleasure of knowing, using and _ constructing 
counterexamples in probability can only be compared with the 
pleasure of walking horizontally, thinking vertically and finding 
beautiful “items” between. 

As always, comments and suggestions from readers are very 
welcome. 

The layout and LTgX typesetting of the present edition are due 
to the superb skills of Venelin Chernogorov. I am very grateful to 
him. 

At the very end (but not at the end of the World!), I will not miss 
the chance to tell the readers that I am sending the material to 
Dover Publications at a unique historical moment that is recorded 
by an extended use of the number 12: 


12.12.12 atl2:12 (GMT). 


December 2012 Jordan Stoyanov (Dancho) 
NewcastleuponTyne, U.K. stoyanovj@gmail.com 


Preface to the Second Edition 


A large amount of newly collected and created material and the 
lively interest in the first edition of this book (CEP-I) motivated 
me towards the second edition (CEP-2). Actually, I have never 
stopped looking for new counterexamples or thinking about how to 
achieve completeness and clarity as far as possible in this work. 

My strategy was to keep the best from CEP-1, replace some 
examples by new and more attractive ones and add entirely new 
examples taken from recent publications or invented especially for 
CEP-2. Thus the reader will find several original topics well 
supplementing the material in CEP-1. 

Among the topics essentially extended are 
independence/dependence/exchange-ability properties of sets of 
random events and random variables, characterization of 
probability distributions, the moment problem, martingales and 
limit theorems. Clearer interpretations of many statements and 
improvements in presentation have been made in all sections. The 
text of CEP-2 is more compact. However, much material has 
remained unused in order to keep the book a reasonable size. The 
Index, Supplementary Remarks and the References have been 
updated and extended accordingly. 

My work on CEP-2 took a long time and, as always, my 
enthusiasm was based on my strong belief about the importance of 
the role of counterexamples to everyone teaching or learning 
probability theory. Additional stimuli came from the positive 
reactions of so many colleagues in so many countries. Like many 
others I experienced difficulties during this time and first had to 
solve the problem of how to survive in this changing and 
unpredictable world. I now use this opportunity to express sincere 
thanks to many colleagues and friends for their attention and 
support during my visits to several universities in The Netherlands, 
Great Britain, Russia, Italy, Canada, USA, France and Spain. In 


particular, large portions of CEP-2 were prepared when I was 
visiting Queen’s University (Kingston, Ontario) and Miami 
University (Oxford, Ohio). The last stages of this work were 
undertaken during a recent visit to Universite Joseph Fourier 
(Grenoble) and in Sofia just before my trip to Kentucky. 

I am very grateful for my collaboration with John Wiley & Sons 
(Chichester). The attention, the patience and the help of Helen 
Ramsey and Jenny Smith were much appreciated. My thanks go to 
them and to all the staff at Wiley. 

Finally, I hope that you, the reader, will benefit from this edition 
and my belief that new counterexamples will be created as an 
essential part of the further development of probability theory. As 
before, any new suggestions are welcome! 


July/August 1996 
Europ/America Jordan Stoyanov 


Preface to the First Edition 


General comments. We have used the term counterexample in the 
same sense as generally accepted in mathematics. Three previous 
books related to counterexamples: on analysis (Gelbaum and 
Olmsted 1964), on topology (Steen and Seebach 1978) and on 
graph theory (Capobianco and Molluzzo 1978), have been and still 
are popular among mathematicians. The present book is a 
collection of counterexamples covering important topics in the 
field of probability theory and stochastic processes. 

It is not only traditional theorems, proofs and illustrative 
examples, but also counterexamples, which reflect the power, the 
width, the depth, the degree of non-triviality and the beauty of the 
theory. 

If we have found necessary and sufficient conditions for some 
statement or result, then any change in the conditions implies that 
the result is false and accordingly the statement has to be modified. 
Our attention is focused on interesting questions concerning: (a) 
the necessity of some sufficient conditions; (b) the sufficiency of 
certain necessary conditions; (c) the validity of a statement which 
is the converse to another statement. However, we have included 
some useful and instructive examples which can be interpreted as 
counterexamples in a generalized sense. 

Purpose of the book. The present book is intended to serve as a 
supplementary source for many courses in the field of probability 
theory and stochastic processes. The topics dealt with in the book, 
and the level of counterexamples, are chosen in such a way that it 
becomes a multi-purpose book. Firstly, it can be used for any 
standard course in probability theory for undergraduates. Secondly, 
some of the material is suitable for advanced courses in probability 
theory and stochastic processes, assuming that the students have 
had a course in measure theory and function theory. Thirdly, young 
researchers and even professionals will find the book useful and 


may discover new and strange results. The wide variety of content 
and detail in the discussions of the counterexamples may also help 
lecturers and tutors in their teaching. 

It should be noted that some of the examples considered in the 

book give the reader an opportunity to become more familiar with 
standard results in probability and stochastic processes and to 
develop a better understanding of the subject. However, there exist 
some examples which are more difficult and their mastering 
requires a considerable amount of additional work. 
Content and structure of the book. The present book includes a 
relatively large number of counterexamples. Their choice was not 
easy. We have tried to include a variety of counterexamples 
concerning different topics in probability theory and stochastic 
processes. Though we have avoided trivial examples, we have 
nonetheless included some which cover elementary matters. 
Pathological examples have been completely avoided. The 
examples which are most useful and interesting fall in between 
these two categories. 

The material of the book is divided into 4 chapters and 25 
sections. Each section begins with short introductory notes giving 
basic definitions and main results. Then we present the 
counterexamples related to the main results, the motivation for 
questions and the counter-statements. Some notions and results are 
given and analysed in the counterexamples themselves. All 
counterexamples are named and numbered for the convenience of 
readers. 

The counterexamples range over various degrees of difficulty. 
Some are elementary and well known counterexamples and can be 
classified as a part of a probabilistic folklore. Also the style of 
presentation needs to vary. Some of the counterexamples are only 
briefly described to economize on space and to provide the reader 
with a chance for independent work. 

Readers of the book are assumed to be familiar with the basic 


notions and results in probability theory and stochastic processes. 
Some references are given to textbooks and lecture notes which 
provide the necessary background to the subject. 

At the end of the book. Supplementary Remarks are included 
providing references and some additional explanations for the 
majority of the counterexamples. For most of the examples we 
have given at least one relevant early reference. Many of the 
counterexamples originate from individual probabilists and 
statisticians and we have cited them fully. Other sources are also 
indicated where the reader can find new counterexamples, ideas for 
such examples or some questions whose answers would lead to 
interesting and useful counterexamples. The Supplementary 
Remarks give readers the opportunity for further work. 

Note about references. References Dudley (1972) and (Dudley 
1976) indicate a paper or book published by Dudley in 1972 or 
1976 respectively. For convenience we have devised abbreviated 
names for the principal journals in the field of probability theory, 
stochastic processes and mathematical statistics. In all other cases 
standard international abbreviations are used. 

History of the book. The book 1s a result of 16 years of my study 
in the field of probability theory and stochastic processes. I started 
to collect counterexamples in 1970 when I was a student at 
Moscow University and later it became an_ intriguing 
preoccupation. As a _ result I increased the number of 
counterexamples to 500 or so. Many of the counterexamples or 
different versions of them belong to other authors. Some new and 
fresh counterexamples were created by colleagues and friends 
especially for this book. During the preparation of the book I have 
been guided by my own experience in lecturing on these topics in 
several European and Canadian universities and in giving special 
seminars in recent years for students of Sofia University. 

The international character of the book is obvious. It 1s not only 
my opinion that the present book is an example, not a 


counterexample, of a successful collaboration and friendship 
among mathematicians from different countries. 
Acknowledgements. The selection and presentation of the 
material in the book, aimed at covering the wide field of 
probability theory and stochastic processes, has not been an easy 
task. | was grateful for the opportunity to discuss the project with 
my many colleagues and friends whose advice and valuable 
Suggestions were extremely helpful. I wish to express my thanks to 
all of them. 

My special thanks are addressed to my teachers Prof. B. V. 
Gnedenko, Prof. Yu. V. Prohorov and Prof. A. N. Shiryaev for 
their attention, general and _ specific suggestions and 
encouragement. Among colleagues and friends I have to mention 
N. V. Krylov, R. Sh. Liptser, A. A. Novikov, Yu. M. Kabanov, S. 
E. Kuznetsov, A. M. Zubkov, 0. B. Enchev and S. D. Gaidov with 
whom I had very useful discussions on several concrete topics. 

My thanks are directed to all colleagues who were so kind as to 
send me their specific suggestions. The names of these colleagues 
are included in the list of references. 

I use the opportunity to express my special grateful to Prof. A. 
T. Fomenko for providing five of his extraordinary drawings 
especially for this book. 

I wish to thank Prof. D. G. Kendall for his interest to my work 
and for his constructive suggestions and encouragement. 

The comments of the anonymous referees and the editor helped 
me to improve both the content and the style of the presentation. I 
express my appreciation to them. 

Finally I should like to thank the collaborators of John Wiley & 
Sons (Chichester) for their patience and for their precise and 
excellent work. It is my pleasure to mention the names of Charlotte 
Farmer and Ian McIntosh. 

Suggestions and comments from readers are most welcome and 
will, if appropriate, be reflected in any subsequent editions of the 


book. 


June 1986, Sofia Jordan Stoyanov 


Basic Notation anc Abbreviations 


(Q,F,P) —probability space 


a(C) — o-field generated by the class C 
EX,VAX  — expectation and variance of the rv. 
E|X|D] | — conditional expectation given o-field D 

R” — n-dimensional Euclidean space 
th — Borel o-field in R” 
N — set of all natural numbers: N = {1,2,...} 
Rt — set of non-negative numbers: R* = [0, 00) 
Ig,/(B) — indicator function of the set B 
N(a,o*) —normal distribution with parameters a and o7 
(p — the standard normal d.f. 
Fy * &> —convolution of the d.f.s &) and Fo 
= — equal by definition 

rVy —xVy := max{z, y} 

GAY —xzrAy:=min{z, y} 
=>, <= — logical implications 

iff — if and only if 
rv. — random variable 
dt. — distribution function 
ch.f. — characteristic function 
Lid. — independent and identically distributed 
a.s. — almost surely 
1.0, — infinitely often 
U.a.n. — uniform asymptotic negligibility 
4 — convergence in distribution 
= — convergence in probability 
bas — convergence in L"-sense 
—> — weak convergence 
“fy — almost sure convergence 
a — convergence in variation 
CLT — central limit theorem 
WLLN — weak law of large numbers 
SLLN — strong law of large numbers 
IFR — increasing failure rate 
IFRA  — increasing failure rate average 


NBU — new and better than used 


Chapter 1 


Classes of Random Events anc Probabilities 
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SECTION 1. CLASSES OF RANDOM EVENTS 


Let Q be an arbitrary non-empty set. Its elements, denoted by o, 
will be interpreted as outcomes (results) of some experiment. As 
usual, we use A U B and 4 / B (as well AB) to represent the union 


and the intersection of any two subsets 4 and B of Q respectively. 


Also, A® is the complement of A € Q. In particular, QY = 8 where 0 
is the empty set. 

The class 4 of subsets of Q is called a field if it contains (2 and 
is closed under the formation of complements and finite unions, 
that is if: 


avyQeEedA; 
(b)AEA=> ASEA; 
(c) A}, Ao € A> ALU Ad EA. 


Taking into account the so-called de Morgan laws, 
(A,Ao)° = AS U AS and (A; U Ag) = AS AS, we easily see that 
(c) can be replaced by the condition 


(c') Ay, Ag € A => Ai Ao € A. 


Thus .4 1s closed under finite intersections. 
The class ¥ of subsets of © is called a o-field if it is a field and 
if it is slesedl under the formation of countable unions, that 1s if: 


(d) Aj, Ao, ajantentats 8 ae. Ane &F. 


m=1 * 
Again, as above, condition (d) can be replaced by 


(d') Aj, Ao,...,E FS A, cc F 


tt 
and clearly the o-field F 1s closed under countable intersections. 
Recall that the elements of any field or o-field are called 
random events (or simply, events). Other classes of events, such 
as the semi-field, D-system, and product of o-fields, will be 
defined and compared with each another 1n the examples below. 
Any textbook on probability theory contains a _ detailed 
presentation of all these basic ideas (see Kolmogorov 1956; 
Breiman 1968; Gihman and Skorohod 1974/1979; Chung 1974; 


Neveu 1965; Chow and Teicher 1978; Billingsley 1995; Shiryaev 
1995). The examples given in this section concern some of the 
properties of different classes of random events and examine the 
relationship between notions which seem to be close to one 
another. 


1.1. A class of events which is a field but not a o-field 


Let Q = [0, oc) and F, be the class of all intervals of the type [a, b) 
or [a, ©) where 0 < a < 5b < ow. Denote by &F, the class of all finite 
sums of intervals of F,. Then F, 1s not a field, and F, 1s a field but 
not a o-field. 

Take arbitrary numbers a and b, 0 <a<b<o, Then A= [a, b) € 
F,. However, A = [0, a) U [b, 0) # F, and thus F, is not a field. 

It is easy to see that: (1) the finite union of finite sums of 
intervals (of F,) is again a sum of intervals; (11) the complement of 
a finite sum of intervals 1s also a sum of intervals. This means that 
F, 18 a field. However, F. is not a o-field because, for example, the 
set A, = [0, I/n) € F, for eachn = 1, 2,..., and the intersection 
(\,—, An = {0} does not belong to Fy. 

Let us look at two additional cases. 

(az) Let Q= r! and ¥ be the class of all finite sums of intervals of 


the type (—«, a], (b, c] and (d, «). Then ¥ 1s a field. But the 
intersection (]~_,(b—1/n,c] is equal to [b, c] which does not 


belong to ¥. Hence the field ¥ is not a o-field. 
(a7) Let Q be any infinite set and 4 the collection of all subsets A € 


Q such that either A or its complement A® is finite. Then it is easy 
to see that .4 is a field but not a o-field. 


1.2. A class of events can be closed under finite unions 
and finite intersections but not under complements 


Let Q=Rr! and the class A consist of intervals of the type (x, ©), x 
€ Q). Then using the notations u=x a y:= min{x, y} andv=xvy 
"= max {x, y} we have: 


However, (x, 0)° = (-0, x] € 4 


1.3. A class of events which is a semi-field but not a field 


Let Q be an arbitrary set. A non-empty class J of subsets of Q is 
called a semi-field if OQ €J, fj) € J, J is closed under the formation of 
finite intersections, and the complement of any set in J is a finite 
sum of disjoint sets of J. It is easy to see that any field of subsets of 
() is also a semi-field. However, the following simple examples 
show that the converse 1s not true. 

(az) Let Q = [-co, cc) and J,, contain Q, {oo} and all intervals of the 


type [a, b) where -o <a<b<o, Then fle J,,, Q €I,,, [ay, by 
[az, bo) = [a] V an, by A b2) €I,, and [a, b)© = [-co, a)U[b, ©). So 
J,,18 a semi-field. Obviously J,, 1s not a field. 

(a7) Take Q = rk! and denote by J» the class of all subsets of the 


form AB (= A | B) where 4 is a closed and 8B 1s an open set in Q. 
Then again, J» is a semi-field but not a field. 


1.4. A o-field of subsets of © need not contain all subsets 
of © 
Recall that the set A € Q 1s called a co-finite set if 1ts complement 


AY is finite. Let ¥, consist of the finite and co-finite subsets of Q. 
Then F; 1s a field. It is a o-field iff Q 1s finite. 


Further, the set 4 € Q is called a co-countable set if A© is 


countable. Let $, consist of the countable and the co-countable 
subsets of Q. Then it is easy to check that F, 1s a o-field. 
Suppose now that Q is uncountable. Then Q contains a subset A 


such that A and A® are both uncountable. This shows that in 
general a o-field of Q need not contain all subsets of Q and need 
not be closed under the formation of arbitrary uncountable unions. 


1.5. Every o-field of events is a D-system, but the 
converse does not always hold 


A system D of subsets of a given set Q 1s called a D-system 
(Dynkin system) in Q if the following three conditions hold: (4) Q 
EeD;a)A,BeDandACB>B\AED; (in) A, € D,n=1,2,.. 
.and Ay © An ©...=> UP, An € D. 

It is obvious that every o-field is a D-system, but the converse 
may not be true, as can be seen in the following example. 

Take Q = {@], @2,..., @2n}, n € N. Denote by Dz the 
collection of all subsets D € © consisting of an even number of 
elements. Conditions (1), (11) and (111) above are satisfied, and 
hence 1D, 1s a D-system. However, if n < | and we take 4 = {q@], 
w7' and B= {@9, 03}, we see that A € D,, B € De and AB = {a9} 
€ Do. Thus Dz is not even a field and hence not a o-field. 


Note that a D-system D is a o-field iff the intersection of any 
two sets in D is again in ‘D (see Dynkin 1965; Bauer 1996). 


1.6. Sets which are not events in the product o-field 


Given two arbitrary sets Q] and Q9, we denote their product by Q 

x Oo : QO] XQ] = {(@]1, @2)} : @] © @], 2 € W2. For any set A € 

W1 X wz we denote by A,,; the section of A at m] : Am, = {@2 € 

5: (@1, ©2) € A}. Analogously, 4,9 = {@ € Q] : (@1, 2) € A}. 
A rectangle in Q1 < (> is a subset of the form 


A, X A2 = {(@], @2): 1] € A, 2 € AD}, Ay © OY, AZ € D2. 


A, X Aj 1s called a measurable rectangle (with respect to F, and F. 
)if Ay © F, and Ay € F, where F, and F. are o-fields of subsets of 
Q, and Q»5 respectively. The measurable rectangles form a semi- 
field of subsets in Q] <x QQ». Thus the field generated by the 
measurable rectangles consists of all finite sums of disjoint 
measurable rectangles. The o-field generated by this field is 
denoted by ¥, x F, and 1s called the product o-field of F, and Fs. 

Let us note the following result (see Neveu 1965; Kingman and 
Taylor 1966). For every measurable set A in (Q)] x Qo, Fy X Fo) 
and every fixed m1 € Qy and Q) € Q», the sections A,,; and A, 
are measurable sets in (Q, F,) and (Q, F,) respectively. 

However, the converse is not true. To see this, let Q be any 
uncountable set and ¥ the smallest o-field of subsets of Q 
containing all one-point elements. Then the diagonal D = {(@, w): 
om € QO} of Q x Q does not belong to the product o-field F x F, 
although all its sections belong to ¥. In other words, for each w € 
Q, the section D,, € ¥ and is an event but D € ¥ x §F and is not an 
event. 


1.7. The union of a sequence of o-fields need not be a o- 
field 


Let F1, Fo, . . . be a sequence of o-fields of subsets of the set Q. 
Then their intersection ee F,, 18 always a o-field and it 1s natural 
to ask whether the union isles F,, 18 a o-field. We shall now show 
that the answer to this question 1s negative. 

Consider the set Q = {w 1, 9, 03} and the following two classes 
of its subsets: F, = {0, {@1}, {eo2, 3}, Q}, Fo = {0 {o2}, {o1, 
w3', Q}. Then F, and F, are fields and hence o-fields. Obviously 


the intersection F, 1 Fo = {), QS, the trivial o-field. However, the 
union 


F — Fy B Fo — {(), {uw}, {we }, { wo, W3 }, {w1, W3 }, Q} 


is not a field, and hence not a o-field because the element {@ 1} U 
{@2} = {@1, o2} EF. 


SECTION 2. PROBABILITIES 


Let Q be any set and 4 be a field of its subsets. We say that P is a 
probability on the measurable space (Q, .4) if P is defined for all 
events A € 4 and satisfies the following axioms. 


(a) P (A) = 0 for each A € 4; P (Q) = 1. 
(b) P is finitely additive. That is, for any finite number of pairwise 
disjoint events 41,..., 4, € A we have 


P (LU 4 — > P(A) 


(c) P is continuous at (j. That is, for a“ events Aj, 42,...€ A 


such that A,4; C A, and()~, An = 0, it is trae that 


lim P(A,) = 0. 


Th OO 


Note that conditions (b) and (c) are equivalent to the next one 


(d). 
(d) P is o-additive (countably additive), that 1s 


m1 


t= 1 


for any events A, A>, ... € A which are pairwise disjoint. 

According to the Caratheodory theorem (see Kolmogorov 1956; 
Loéve 1977; Shiryaev 1995), if Po is a o-additive probability on 
(QQ, A) and F = o(A) denotes the smallest o-field generated by the 
field 4, then there is a unique probability measure P on (Q, ¥) 
which is an extension of Po in the sense that P (4) = Po (A) for A € 
A. In this case we also say that Po is a restriction of P over 4 and 
write P|.4 = Po. 

The ordered triplet (Q, ‘F, P) 1s called a probability space if: 


() is any set of points called elementary events (outcomes); 

F 1s ao-field of subsets of Q; the elements of ‘F are events; 

P is a probability on ¥, that is P satisfies conditions (a), (b) and 
(c) above, or, equivalently, (a) and (d). 


Thus we have described the axiomatic system which is generally 
accepted in probability theory. This system was suggested by A. N. 
Kolmogorov in 1933 (see Kolmogorov 1956). 

In this section we present a few examples characterizing some of 
the properties of probability measures. The important notion of 
conditional probability is introduced and treated in Example 2.4. 


2.1. A probability measure which is additive but not o- 
additive 


Let Q be the set of all rational numbers 7 of the unit interval [0, 1] 
and F, the class of the subsets of Q of the form [a, 5], (a, b], (a, b) 
or [a, b) where a and 5b are rational numbers. Denote by F, the 
class of all finite sums of disjoint sets of F,. Then F, 1s a field. Let 
us define the probability measure P as follows: 


P(A) = b-a, if A EF, 


=) P(A;), if Be F2, thatis B= 0, Ai, Ai € Fi. 


Consider two disjoint sets of F, say 


Th th 


B=)°A, and Bi=>7A 

i=1 j=l 
where A;, Aj, © F, and all A;, A; are disjoint. Then 
B+B= men C;,, where either Cy = A; forsomei=1,..., 7, 
or Chk= A; rag some j = 1,...,m. Moreover, 


P(B +B’) =P rn) SS P(Ck) = >> (P(Ai) + P(A‘)) 
ke 


tJ 


) + dX P(A’) = P(B) + P(B’) 


and hence P is an additive measure. 
Obviously every one-point set {rv} € F, and P ({r}) = 0. Since Q 
is a countable set and(Q) = 5°", {r;}, we get 


P(0) =140= S$ >P({ri}) 


i=] 


This contradiction shows that P is not o-additive. 


2.2. The coincidence of two probability measures on a 
given class does not always imply their coincidence 
on the o-field generated by this class 


Let © be a set and @ a class of events such that 
A,BcC+> ABCC (that is, @ is closed under intersection). 


Denote by ¥ = o(@) the o-field generated by @. Let Q; and Q> be 
two probabilities on the measurable space (Q, ¥). The following 
result is well known (see Breiman 1968): 


Q: Q, on€ => Q, — Q. on F. 


It is not surprising that results of this kind depend essentially on 
the structure of the class @. By an example we show the importance 
of the hypothesis that @ is closed under intersection by an example. 

Take QO = {a, b, c, d} and two measures Q and Q> defined as 


follows: 


Q, (a) = Q,(d) = Qo(b) = Qa(c) = 
Q, (5) = Qi(c) = Qo(a) = Q2(d) = 


Let ¥ be the class of all subsets of OQ ande@ = {aUb,dUc,auUec, 
b U d}. Here and below x U y denotes the two-element set {x, y}. 
Then it 1s easy to check that Q; = Q» on @. For example, 


ae) [oe] [oe 
= | 


Q, (dUc) = Q,(d) +Q,(c) = 4 + 
Q,(dUc) = Qo(d) + Q.(c) = 3 + 


and thus Q; (d U c)= Qo (d Uc). Analogously, Q;(-) = Qo(-) for 
all remaining elements of @. However, it is evident from the 
definition of Q; and Q> that the equality Q j(-) = Q9(-) does not 
hold for all elements of ‘; for example, for each of a, b, c and d. 
The reason for this is that @, as taken, is not closed under 
intersection. 


i) Ne ed 


2.3. On the validity of the Kolmogorov extension 


theorem in (R”, B”) 


Recall that the probability measures in the space R”, n > | are 
constructed in the following way: first for elementary sets 


(rectangles of the type (a, b]), then for sets A = }) (a,;,6;|, and 
finally, by using the Caratheodory theorem (see Loéve 1977; 


Shiryaev 1995), for sets in B”. 

A similar construction can be used for the space (R™,B).. 
Indeed, let C,, (B) = P,(B) = P(C,(B)), B&B". denote a 
cylinder set in R® with base B € 8”. It is natural to take the 
cylinder sets as elementary sets in R® with their probabilities 
defined by the probability measure on the sets of B®. 


Suppose P is a probability measure on (R”, B”). Forn = 1, 2,.. 
. we put 


P,41(B x R') = P,(B). 


Thus we obtain a sequence of probability measures P], Po, ... 


defined respectively on (et, py, (R2, B2), Forn=1,2,...andB 
Ee 8” the following consistency (or compatibility) property holds: 


(1) P,41(B x R*) = P,(B). 


We now formulate a fundamental result. 
Kolmogorov theorem. Let Pj, Po, . . . be a sequence of 


probability measures respectively on (1, py, (RZ, B-), a4 
satisfying the consistency property (1). Then there is a unique 


probability measure P on (R™, B”) such that its restriction on B" 
coincides with Py, that is, P (C, (B)) = P, (B), Be B",n=1,2,... 


The proof of this theorem can be found in many textbooks (see 
Kolmogorov 1956; Doob 1953; Loeve 1977; Neveu 1965; Feller 
1971;Billingsley 1995; Shiryaev 1995). Let us note that it uses 
several specific properties of Euclidean spaces. However, this 
theorem may fail in general (without any hypotheses on the 


topological nature of measurable spaces and on the structure of the 
family of measures {P,}). This is seen from the following 


example. 
Consider the space Q = (0, 1]. (Clearly Q is not complete.) We 
shall construct a sequence of o-fields F,, F, .. . and a sequence of 


probability measures {P,,} where P,, 1s defined on (Q, ¥,,). Let ¥ = 
o(U¥F,,) be the smallest o-field containing all ¥,. Then we shall 
show that there is no probability measure P on (Q, +) such that its 
restriction P|F, on F, coincides with P,,n=1,2,.... 

For n = 1, 2, . .  . #=define the — function 
hele) = Tak ew: <= TD etniipt) — 0 if 
I/n sw < 1. Let, {YAEN:A fine ha(w) € B}, B € B*} and 
Fi G1 Oyecey ly } be the smallest o-field containing the sets @], 

.,@y. Clearly F¥, C Fo C........ On the measurable space (Q, 
F,) define a probability measure P,, as follows: 


| a | as Ly Me Glpsrmaqek ee” 
Pylw : (hi(w),.--,hn(w)) € B"] = ta os aie | 


where B” € 8”. It is easy to see that the family {P,,} is consistent: 
if A € F, then P,,,(A x R') = P,(A). 

Suppose now that there exists a probability P on the measurable 
space (Q, F) such that P|F, = P,. If so, then forn = 1, 2,... 


(2) Plw:fiylw) = .0c = Ayle) = 1] = Blw : he) =... = ha) = 1] = 1. 


However, {w:h] (@) =...= A, (a) = 1} = (0, I/n) | (, which 
contradicts (2) and the requirement for the set function P to be o- 
additive (or, equivalently, to be continuous at the ‘zero’ set (). 


2.4. There may not exist a regular conditional 
probability with respect to a given o-field 


Let (Q, 4, P) be a probability space and ] a o-field such that ¥ 
C ¥. Recall that the conditional probability P(A|¥ 1) is defined P- 
a.s. aS an ‘F ;-measurable function of w such that 


P(AB) = | P(A|F,)dP(w) foreach B € F,. 
B 


The conditional probability P(A|¥ 1), 4 € ¥ is said to be regular if 


there exists a function P (4, w), A € ¥, w € Q, which satisfies the 
following two properties: 


(i) P(A, m) = P(A|¥]1) P-a.s. for an arbitrary A € F; 
(ii) for fixed wm, P(-, w) 1s a probability measure on ¥. 


If condition (11) 1s satisfied and condition (1) holds for all w (not 
only for P-almost all w), then P(A|¥1) 1s called a proper regular 


conditional probability. (In terms of distributions we speak about 
regular and proper regular conditional distributions.) Regular 
conditional probabilities exist in many cases, but proper regular 
conditional probabilities do not always exist, as can be seen below. 

Let (Q, ¥, A) be a probability space with Q = [0, 1], ¥ the o- 
field of the Lebesgue measurable sets in [0, 1] and A the Lebesgue 
measure. It is well known that in the interval [0, 1] there 1s a non- 
measurable (in Lebesgue sense) set, say N, such that its outer 
measure is A*(N) = | and its inner measure is Ax (NV) = O (for 


details see Halmos 1974). 
Define a new o-field ¢ which is generated by ¥ and the set N. 
Thus F consists of sets of the form N By U N° By where B), Bo € 


F. Define also the measure P on the measurable space ([0, 1], ¢, 
P) by 


P(N B, U N°Bo) = 5[A(Bi) + A(Bo)]. 


It is easy to check that P is well defined and defines a 
probability on ¢, so the triplet ({0, 1], ¢, P) is a probability space. 
For every B € F we have 


P(N B, U N° Bo) = P(B) = X(B) 
and hence P coincides with A on F, that 1s P| = A. Moreover, 
P(N) — 5: 


Now we shall prove the following statement: on the probability 
space ({0, 1], ¢, P) there is no regular conditional probability P(4| 
F), A € F¢ with respect to the o-field F. 

Suppose such a probability exists: that is, there 1s a function, say 
P(A, @), which satisfies the above conditions (1) and (11). If so, then 
for any Borel (and Lebesgue) set A, P(A, w) = 1 4(@). Therefore if 


A is a one-point set, d = {w}, then P({@w}, w) = 1. Now take the set 
N. From the definition of a conditional probability and the equality 


P(N) = 4 we get 


bole 


— P(N) = [ P(N, w)d (dw). 


On the other hand, if P(-, w) is a measure for each w, then 
P(N,w) > P({w},w) = 1 forallw € N => P(N,w) = 1 forallw EN. 


Consider the set C= {@:P({@}, w) = 1}. Since P(-, w) is a Borel 
function in @, then the set C is Borel measurable with P(C) = 1. 
Let D= {@: P {N, wm) = 1}. It is clear that D is Borel-measurable 


and DD CN, which implies that D U C° D N. However, the set D 
U C° is Borel and covers the (non-measurable!) set NV which has A* 


(N) = 1. Therefore P(D U C°) = 1 and P(D) = 1. In other words, for 
almost all w we get P(N, w) = 1, which implies the following 


equality 
1 P(N,w)d (dw) = 1. 
Q2 


However, this contradicts the relation [, P(N,w)A (dw) = 
obtained above. 

Therefore a regular conditional probability need not always 
exist. 

Let us note that in this counterexample the role of the non- 
measurable set N is essential. Recall that the construction of NV 
relies on the axiom of choice. Using a weakened form of the axiom 
of choice, Solovay (1970) derived several interesting results 
concerning the measurability of sets, measures and their properties. 

General results on the existence of regular conditional 
probabilities can be found in the works of Pfanzagl (1969), 
Blackwell and Dubins (1975) and Faden (1985). 


bole 


SECTION 3. INDEPENDENCE OF RANDOM EVENTS 


Let (Q, ¥, P) be a probability space. The events A and B of ¥ are 
said to be independent (with respect to P) if 


P(AB) = P(A)P(B). 


More generally, two classes of events (for example fields, o- 
fields), say A, and Ao, Aj, A] © ¥ are called independent if any 
two events A; and A> where A] € 4], Ao € Az are independent. 

The concept of independence of two events or two classes of 
events can be extended to any finite number of events or classes. 
We say that the events A},..., A, © ¥ are mutually independent 
if the following relation (product rule) 


(1) P(A;, A;, ...Ag,) = P(A;, )P(Ag,) . .. PCAG, ) 


. f1i, 


is satisfied for all A and ij, i9,...,i, wherek=2,...,nand1<7 
<i9<...<ip-<n. Thus for the mutual independence of 7 events 


all 2”—n—1 relations (1) must be satisfied. If at least one relation 
does not hold, the events are dependent. If all the relations (1) fail 
to hold, we say that the events A),..., A, are totally dependent. If 


the product rule (1) holds only for & = 2, the events are pairwise 
independent. Finally, if (1) holds for all k, 2 <A<m for some m < 
n, we have a set of n events which are m-wise independent 
(pairwise independent if m = 2 and mutually independent if m = n). 

When considering the independence/dependence properties of 
collections of random events it is natural to speak about the 
product rule (1) at level k, that 1s, that (1) holds or does not hold for 
any of the ia possible combinations (k;-tuples) of events. Thus we 
can characterize each level k,k=2,...,n, as being independent or 
dependent. Some interesting (and even unusual) possibilities will 
be illustrated in the examples below. 

It is obvious how to define the independence of a finite number 
of classes of events. If A, B © F and P(B) > 0 we denote by P(A|B) 
the conditional probability of A given B and put 


P(A|B) = P(AB) /P(B). 


The independence of two events can easily be expressed through 
conditional probabilities. Another notion, that of conditional 
independence, is considered in one of the examples. 

The examples included in this section aim to help the reader 
understand the meaning of the fundamental notion of independence 
more clearly. 


3.1. Random events with a different kind of dependence 


In a Bernoulli scheme with a parameter p we shall consider two 
events which, according to the value of p, are either independent or 


dependent. 

Let H = {heads} and 7 = {tails} be the outcomes at tossing a 
coin with P(7#/) = p, P(T) = 1 — p, 0 < p < 1. Toss the coin three 
times independently and consider the events A = {at most one 
tails! and B = {all tosses are the same}. Obviously A = {HHH, 
HHT, HTH, THH}, B = {HHH, TTT}. Hence 


P(A) =p? +3p?(1—p), P(B)=p?+(1-—p)*, P(AB) =p’. 
It is easy to see that the product rule 
P(AB) = P(A)P(B) 


holds in the trivial cases p = 0 and p = 1 and in the symmetric case 
p= 5. Hence the events A and B are independent if p = 0, or p = 1, 
or p = 5. For all other values of p in the interval [0, 1], A and B are 
dependent events. 


3.2. The pairwise independence of random events does 
not imply their mutual independence 


It is natural to start with the first ever known examples showing the 
difference between the mutual and pairwise independence of 
random events. 

The two examples (1) and (11) below, first presented by 
Bohlmann (1908) and Bernstein (1928), were created 1n a period of 
active studies in probability theory and its establishment as a 
rigorous branch of mathematics. 


(i) (Bohlmann 1908). Suppose we have at our disposal 16 capsules 
with no difference between them. In each capsule we insert three 
small balls labelled a, b, c and each ball is either white or black. 
The capsules are put in an urn, mixed well, and we choose 
randomly one capsule. We open this capsule to see what is inside, 
that is what is the outcome of our experiment. We are interested in 


the property denoted by (a1, a2, a3) where a; = | if a white ball is 
at position j and a; = 0 if that ball is black, 7 = 1, 2, 3. The question 
is: what kind of dependence exist between a1, a2 and a3? 


Clearly, this original and illuminating description is equivalent 
to considering an urn with 16 capsules marked (inside) as follows: 
three capsules by 111; three capsules by 100; three capsules by 
010; three capsules by 001, and each of the marks 110, 101, O11 
and 000 is used just once among the remaining four capsules. We 
choose one capsule at random and consider the following events: 


A; = {”1” at jth position}, 7 = 1, 2,3 


(equivalently Aj = {a; = 1},7 = 1, 2,3). 
We easily find that P(A,) = 5, P(A2) = 3, P(A3) = 4 and then 


P(A, Ag) — ., P(A, A3) — +, P(A2 As) = z 


implying that the events A}, Aj, Az are (at least) pairwise 
independent. 
However 


P(Ay Ao As) = & 4 111 = P(A,)P(Ap)P(A3) 


6 
and hence these events are not mutually independent. 


(ii) (Bernstein 1928). Suppose a box contains four tickets labelled 
112, 121, 211, 222. Choose one ticket at random and consider the 
events A; = {1 occurs in the first place}, 49 = {1 occurs in the 


second place} and 43 = {1 occurs in the third place}. Obviously 
P(41) = P(A2) = P(43) = i and 


P(A, Az) = P(A A3) = P(A2A3) = F. 


This means that the three events Aj, Az, A3Z are pairwise 


independent. However, 


P(A, Ap A3) = 0 4 4 = P(A,)P(A2)P(As) 


and hence these events are not mutually independent. 


(iii) Consider the six permutations of the letters a, b, c as well as 
the triplets (a, a, a), (b, b, b) and (c, c, c). Let Q consist of these 
nine triplets as points, and let each have probability 5 Define the 
events Az = {the Ath place is occupied by the letter a}, A = 1, 2, 3. 
Then obviously 


and hence the events Aj, Az, Az are pairwise independent. 
However, they are not mutually independent, since 4; Az > A3, 
which implies that 

P(A, A2A3) = 5 F = 


zr” 

The same idea can be generalized as follows. Let Q contain n! + 
n points, namely the n! permutations of the symbols aj,..., dy 
and the n repetitions of the same symbol a7, k=1,...,n. Suppose 
that each of the permutations has probability 1/ [n2(n — 2)!] while 


each of the repetitions has probability 1/n2. Then it is not difficult 
to check that the events A; = {a1 occurs at the Ath place}, A= 1,.. 


., nN, are pairwise independent, but no three of them are mutually 
independent. 


(iv) Let 41, Ap, A3 be independent events each of probability + and 


put Aj; = (4; 4 Aj)® where 4 denotes the symmetric difference of 


two sets: Aj “ Ap = A; AS + A;A; or, equivalently, 
A; A A; = (A; \ A;) U(A; \ Aj). Cin particular, we could consider 
the following simple experiment: three symmetric coins numbered 
1, 2, 3 are tossed; then 4; = {coin i falls heads}, Aj; = {coins i and j 


agree}.) Then the events Aj 2, Aj3, A23 are not mutually 
independent, though they are independent 1n pairs. 


(v) Let £ be the set of all n> three-letter words s of a language and 
all words are equally likely. Define the events A, B and C as 
follows: 


A = {s © £: s begins with a specified letter, say x}, 
B = {s € £: shas the letter x in the middle}, 


C= {s € £: s has two of its letters the same}. 


Then A, B and C are pairwise but not mutually independent. 


3.3. The relation P(ABC) = P(A) P(B) P(C) does not 
always imply the mutual independence of the events 
A, B,C 


(i) Let two dice be tossed, © = all ordered pairs ij, i,7=1,..., 6 
and each point of Q has probability 7 . Consider the events: 


A = {first die = 1, 2 or 3}, 
P= qnrietdie— 3, 40r5 


C' = {the sum of two faces is 9}. 


Obviously we have AB = {31, 32, 33, 34, 35, 363, AC = {136!, BC 
= {36, 45, 54}, ABC = {36}. Then P(A) = 1, P(B) =1, P(C) =4 and 


P(ABC) = 4 = 33 


Nevertheless the events A, B, C are not mutually independent, 
since 


67 4 
P(AC) = 2 44 =P(A)P(C), 
P(BC) = 4 4 4 =P(B)P(C). 


In other words, independence at level 3 does not imply 
independence at level 2. 


(ii) Let Q = {1, 2, 3, 4, 5, 6, 7, 8}! where each outcome has 
probability 1/8. Consider the events B; = {1, 2, 3, 4$, By = B3 = 
{1, 5, 6, 7}. Then P(B7) = P(B2) = P(B3) = 4, By Bz B3z = {1} and 
thus P(B,B2Bs) = § = 3° 5-° 3 =  P(B))P(B2)P(B3). 
However, the events By and B3 are not independent and hence the 
three events are not mutually independent. 


(iii) Let the space Q be partitioned into five events, say 41, A2, 43, 
Ay, As, such that P(A) = P(42) = P(A3) = 15/64, P(A4) = 1/64, 
P(45) = 18/64. Define three new’ events, namely 
B= A,UA4,C = AgUAg, D= AgzU Ay, Then P(B) = P(C) = 
P(D) = 1/4, P(BCD) = 1/64: that is, P(BCD) = P(B)P(C)P(D). 
However, P(BC) = P (Aq) = 1/64 # 1/16 = P(B)P(C) and hence the 
events B, C, D are not mutually independent. 


3.4. A collection of m + 1 dependent events such that any 
n of them are mutually independent 


(i) A symmetric coin is tossed independently n times. Consider the 
events A; = {heads at the Ath tossing}, fork=1,...,nand4,4] 


= {the sum of the heads in these n tossings is even}. Then 
obviously 


are 1 l : l n\ fn grt l 
P(A,) = akin P(A,) = 9? P(An+i) = gn le ™ es) +] ~ ong 


It is easy to see that the conditional probability 
P(A,41|/A1... An) = 1ifin is even, and 0 if 7 is odd. This implies 
that the equality 


P(A, ie An An+1) _ P(A,) 7 P(A,)P(An+1) 


is impossible because the right-hand side is 2" * )) and the left- 


hand side 
P(A,..- An Angi) = P(Ai----An)P(AngilA1---An) = 27" if n is 
even, and 0 if 1 is odd. Therefore 4],...,A,, Ay + 1 cannot be a 


collection of mutually independent events. 
Now take any v of these events. If we have chosen 4],..., Ay, 


they are independent, since for any A; , er Aj, 2<k<nwe have 
P(A; ... Aj) = P(4;,) .. . P(4;,). It remains to consider the choice 
of n events including A, + ; and — | events taken from 4],..., 
A,, for example Az, 43, ..., Ay, Ay + 1. For their mutual 
independence it suffices to check that 


(1) P(A;, ae Aim Fix ) = P(A;, ) je . P(A, )P(An+1) 
where 1 <m<n-1,i],..., i,, are among 2,...,n. We have 
P(4;) =... = P(Ai,,) = P(An41) = 5 and thus the right-hand side 


of (1) is 2-@"*), Further, 


P(A Ai, An+1) — P(A;, A ben YP(An+1|Aj, — A;,, ) 


— 9-my-l _ 9-(mtl) 


ip. 


Thus (1) 1s satisfied and therefore any n events among the given 
n + 1 events are mutually independent. In other words, the 
dependent n + | events 4j,..., 4,4] are n-wise independent. 


We can conclude that if we have n + 1 events and any n of them 


are mutually independent, this does not always imply that the 
given events are mutually independent. Clearly this is a 
generalization of the Bernstein example (see Example 3.2(11)). 


(ii) We are given 1 + | points in the plane, say M),..., M41, 
which are in a general position (no three of them lie in a straight 


line). Join up the points in pairs and obtain ("5") segments. Then 


we put a pointer to each of the segments by tossing a symmetric 
coin ("5") times: if we consider the segment M; Mj, i <j, and the 
result of the tossing is heads, we put a pointer from M; to M,;; if 
tails, the pointer goes from M; to Mj. Consider n + | events Aj, ... 


, A,+1, where 


Aj = {the number of pointers going to M; is even}, kK=1,...,n+ 
l. 


Then for each k, 2<k<nandany 1 <i] <ip9<...<ip<nt1, the 
events A;], Ajo, ..., Ajj are mutually independent. However 4}, . . 
. , Ay+ are dependent and so we have another collection of n + 1 
dependent events which are n-wise independent. 


3.5. Collections of random events with ‘unusual’ 
independence/dependence properties 


Let us describe a few probability models and collections of random 


events with specific properties. 


(i) Suppose that the sample space of an experiment is Q = {1, 2, 3, 
4,5, 6, 7, 8} with probabilities pz = P({k!}) defined as follows: 


eo —, = _. 7-160 —_ nm. — wr, — 1+8a4 _ |] 
P= 4, Po = Ps = 4 = ey Pe PO HK PT HK oy PS — s 


where @ is an arbitrary number in the interval (0, 35). Consider the 


events 
A, = {2, 5, 6, 8$, Ao = {3, 5, 6, 8}, Az = {4, 6, 7, 8}. 
We easily find that P(41) = P(A2) = P(43) = 4 and then 


P(A, Ag) = = P(A, A3) = P(A2A3) — it 


Hence the events A;, Az, A3 are independent at level 3 for any 
value of the parameter a € (0, =). ta 4 they are independent 
at level 2 and this is the only case when these three events are 
mutually independent. 


Gi) Let QO = f{1, 2, #3, #4 #5, #426} ~~ with 
Pi = a: Po = p3 = Pa = PS = a, Pe = Z- Consider the 
following events: 


Al Vey 3A = (2,59 As = tly 2d = 1g Sy, 
5}. 


Then P(41) = P(42) = P(43) = P(44) = 4 and we find further that 


P(A;A;) =X, alli<j; P(AjA;A)) = 3, alli<j<l; P(AyA2A3A4)= +. 


ik? 24 


Therefore these four events are independent at level 4 but they are 
dependent at level 2 and dependent at level 3. 


(iii) Take a sample space Q containing |Q| = 16 outcomes denoted 
by 1, 2,..., 16 each having the same probability 5: Consider the 
events: 


2,3,4,5,6,9,13,16}, B= 
A, ¢ 


{4,7,8, 10, 11, 13, 14, 16}, 
3,7, 8, 10, 11, 13,14}, D = {3,4,5, 


6,9, 10, 15, 16}. 


{ 
= 14, 
Then P(A) = P(B) = P(C) = P(D) = 5 and since ABCD = {4} we 


have 
+ = P(ABCD) = P(A)P(B)P(C)P(D) 


and hence the product rule is satisfied at level 4. Further, ABC = 
{4, 13! implying that 


1 — P(ABC) = P(A)P(B)P(C) 


and similarly the product rule holds for any of the remaining five 
possible triplets of events. It turns out, however, that the product 
rule does not hold for any 6 = (5) possible pairs of events. In 
particular, CD = {4, 6, 10} and 


= = P(CD) # P(C)P(D) = 5. 


Thus the events A, B, C, D are independent at level 4, 
independent at level 3 and (completely) dependent at level 2. 
(iv) Suppose the space Q consists of |Q| = 12 outcomes, say 1, 2, .. 
., 12 with the following probabilities: 


se tell, teehee nite eects aoa pall 
Pl = jg Pa — Ps — Pa — PS = 5G: 

he Peace a aioe dai tae iat he CosheRting tak Eh Git me 

Pe Pr = Pea Pe = Pie FH Pil aes Pi = ae 


Define the events B;, Bo, B3, Bg as follows: 


B, = {1,2,3,4,6,7,8}, Bo ={1,2,3,5,6,9, 10}, 
Bz = {1,2,4,5,7,9,11,}, Bs = {1,3,4,5, 8, 10, 11}. 


Standard reasoning leads to the following conclusion: the events 
B,, By, B3, Bg are independent at level 4, dependent at level 3 and 
independent at level 2. (The details are left to the reader.) 


3.6. Is there a relationship between conditional and 
mutual independence of random events? 


The random events 4}, Az, . .., A, are called conditionally 
independent given event B with P(B) > 0 if 


P(A, Ay... Ay|B) = P(A: |B)P(Ao|B)...P(A,|B). 


We want to examine the relationship between the two concepts 
mutual independence and conditional independence. 


(i) Suppose we have at our disposal two coins, say a and b. Let pg 
and Ph, Pg # Pp, be the probabilities of heads for a and b 


respectively. Select a coin at random and toss it twice. Consider 
the events A; = {heads at the first tossing}, Ay = {heads at the 


second tossing} and B = {coin a 1s selected}. Then P(A] A9|B) = 
PaPb, P(A1|B) = Pa, P(42|B) = pag. Hence P(A, A2|B) = 
P(A1|B)P(A2|B), and the events A; and A> are conditionally 
independent given B. However, 


P(A, A2) = D2 ++ 5 ps, P(A) = 5 (Da +p»), P(A2) = 5 (Pa + pp) 


and since py # pp the equality P(A, Az) = P(A1)P(42) Is not 
satisfied. 

Therefore the events A; and A» are not independent, despite 
their conditional independence. 


(ii) A symmetric coin is tossed twice. Consider the events A; = 
{heads at Ath tossing}, A = 1, 2 and B = {at least one tails}. Then 
P(A1) = P(42) = 4, P(A, Az) = | and hence the events A; and A 
are independent. Further, it is easy to see that P(A1|B) = P(A9|B) = 
7 However P(A, A9|B) = 0 and (1) fails to hold. Therefore the 
independent events A; and A» are not conditionally independent 
given B. 

The final conclusion is that there is no relationship between 


conditional independence and mutual independence, that is neither 
one of these properties implies the other. (See also Example 7.14.) 


3.7. Independence type conditions which do not imply 
the mutual independence of a set of events 


Suppose the random events A], A9,..., A, satisfy the conditions 
(1) PiApim me, Pl ArAsncAgl eH bewwte: Bo Lasix, n 


which could be called independence-type conditions. In (1) pj,... 
, Py are arbitrary numbers in the interval (0, 1). 


Obviously, if nm = 2 and (1) 1s satisfied, this is merely the 
definition of the independence of two random events A; and A>. 


We ask the following question: does (1) imply, in the general case 

when n < 3, that the given events are mutually independent? Of 

course, it 1s clear that (1) 1s much less than the standard condition 

for mutual independence. Thus we can expect that the answer to 

this question is negative. Let us illustrate the truth of this with the 

following example, considering for simplicity the case n = 3. 
Suppose 41, A, 43 are random events such that 


P(A i AgAs) = P( Aj. As AS | = 
P(A; AS AS) = P( AS A2AS) = 
P( 


1 P(A, A$ Ag) = P(ASA2 Az) = 3 —<, 
t+e, P(A{ASA3) = $4 2¢, 
AS AS AS) = 


i) =_ 
ee 


1 
where0<e << =. We can easily check that 
P(A;) — P( Az) — P( As) - 5; P(A, Ag) — +, P(A | 4 As A s — =a 


and thus the conditions in (1) hold. For the mutual independence of 
A,, Az, Az, the equalities P(A] A3) = P(41)P(43) and P(42 A3) = 
P(A,)P(A3) must also be satisfied. However, 


P(A, A3) — « —é€é f- 4 — P(A,)P (A3). 


Hence the independence-type conditions (1) are satisfied for the 
events Aj, Az and Az, but these events are not mutually 


independent. 


3.8. Mutually independent events can form families 
which are strongly dependent 

Choose a number x at random in the interval [0, 1] and consider the 
expansions of x in bases 2, 3,.... Denote by 47, k= 2, 3,..., the 
family of sets A") m = 1,2,..., containing all points x whose nth 
digit in the expansion in base k is equal to zero. Then for every 
fixed k the events 4\") m—1,2,..., are mutually independent. 
This is easily checked, but for details see Neuts (1973) or 
Billingsley (1995). 

We want to know whether the families 47, k = 2, 3,..., are 
independent. To see this, take the events 


q(2) 4(3) 44) 
Aj Rae , Aj ae ee 


which are representatives of the famihes A»2,A3,Ay,... 
respectively. On the one hand, for any n > 2, 


no 1 ‘ 
P A) —-Pizer<-—jJ=- 
(4) -p(<2) =! 


because the first digit in the number base k is 0 iff x < 1/k for k = 2, 
3,...,n. However, on the other hand. 


T path) TPL ph pe AO 
Nie =H g- ae s-P(A A ) 


Therefore the famihes 4;, k = 2, 3, ... , are not independent, 
although they are generated by mutually independent events. 


3.9. Independent classes of random events can generate 
o-fields which are not independent 


Let (Q, F, P) be the standard probability space: Q = [0, 1], F 1s the 
Borel o-field Bro, 1] generated by the subsets of © and P is the 
Lebesgue measure. Consider the following two classes of random 
events: A; = {A1], 412} and As = {A} where 


Then P(A;1) = $, P(Ai2) = 4, P(A2) = $. Hence 


— 


Therefore the classes A, and A» are independent. 
It is easy to see that the o-fields o(A,) and o(A») generated by 
A, and A» are not independent. E.g. if Ay = Ay] Ajo, then 


P(A,) = 4, A, Aa = [0, +) and 
P(A, Ao) = + 4 4-4 = P(A)P(A2). 


A similar example can be given in the discrete case. It is enough 
to take e.g. the sample space Q = {1, 2, 3, 4} with equally likely 
outcomes and two classes A; and A» where A, contains one of the 
outcomes of Q and A» contains two of them. A simple calculation 
leads to a conclusion like that presented above. 

Let US note finally that o(A,) and o(A»2) would be independent 
if each of A, and A» were a m-system, 1.e. Q € A; and Aj, 7 = 1, 2, 
is closed under intersection. 


SECTION 4. DIVERSE PROPERTIES OF RANDOM 
EVENTS AND THEIR PROBABILITIES 


Here we introduce and analyse some other properties of random 
events and probabilities. The corresponding definitions are given 
in the examples themselves. This section is a natural continuation 
and an extension of the ideas treated in the previous sections. 


4.1. Probability spaces without non-trivial independent 
events: totally dependent spaces 


Let (Q, ¥, P) be a probability space. Recall that the events 4, B € 
F are non-trivial and independent if Q < P(A) < 1, 0 < P(B) < 1 
and P(AB) = P(A)P(B). One might think that every probability 
space contains non-trivial independent events. However, this 
conjecture is false. 


(i) Let Q be a finite set, Q= {w1,..., @,} and 
P({w1}) =1—(n—Le, P({w2}) = P({w3}) =... =P({wn}) =e 


where € 1s an irrational number, 0 < ¢ < (n — 11. Suppose there 
exists a pair A, B of non-trivial independent events. We have the 
following three possibilities: (1) @) € A, w] EB; (2) w1 EA, w] € 
B, or conversely; (3) @] € A, w; € B. We can easily verify that the 
independence condition 1s not satisfied in any of the cases (1), (2) 
or (3). For example, consider case (2). Here A contains some k 


outcomes taken from @),..., @, and B consists of w 1 and some / 
outcomes taken from @9, ... , @,. Then the intersection AB 
contains elements taken only from @9,..., @,. Let their number 


be m,m<k. We obtain the following equality: 
me = [1 —(n—1)e + le|ke. 


It follows that ¢ = (k — m)/[A(n — 1 — JD], which contradicts the 
assumption that ¢ 1s irrational. 
Similar reasoning can be used in cases (1) and (3). Therefore, in 


this example nontrivial independent events do not exist. Moreover, 
it can be shown that more than two non-trivial events also do not 
exist. Notice that here Q 1s a finite set. 


(ii) In case (1) Q was a finite set. Let now Q be a countably infinite 
set, Q= {@], w2,...}, and let 


P({w,}) = 27", k = 2,3,..., P({wi}) =e with e=1—)  P({w,}). 
k=2 


Note that the latter infinite series is convergent and its sum € Is a 
number in (0, 1) and it is crucial for further reasoning that € 1s an 
irrational number (in fact, € is also transcendental; ¢ is a Liouville 
number). It can be shown that any finite or infinite collection of 
arbitrarily composed random events is totally dependent. 


(iii) In cases (4) and (11) above we have described probability 
spaces with total dependence of their events, no matter how they 
are defined. In such a case we use the term totally dependent 
probability space. Notice, however, that in (1) and (11) Q 1s a 
discrete set and the probability measure P is purely discrete. Hence 
there are purely discrete probability spaces which are totally 
dependent. This immediately leads to the question: is it possible 
for a non-purely discrete probability space to be totally dependent? 

Recall that ‘non-purely discrete’ means that P is not just a sum 
of ‘atoms’, as in cases (1) and (11) above. Now we assume that there 
is a subset Q. © QO with P(Q,) > 0 and such that the restriction P| 


QO. of P on Q, is non-atomic: P({@!}) = 0 for each m € Q,. Let 
P(Q_..) = c where obviously 0 <c < 1. Let us clarify if such a space 


can be totally dependent. For this we need the following result 
known as the Lyapunov theorem (Rudin 1973): For any 5, Q<b< 
c there is a subset (event) D © Q, such that P(D) = b. 


Let now p be a fixed number, 0 < p< c. As a consequence of the 


above cited result we can find three events, say D;, D7, D3, which 
are pairwise disjoint and such that 


P(D,) =p", P(D2) = p(1 — p), P(Ds) = p(1 — p) 
(the measure of D} U Do U D3 1s p— p <c). Define the events 
A=D, UD) and B= Dj, U D3. 


Obviously P(A) = p, P(B) = p and since AB = D, where P(D1) = 
p, we get 
P(AB) = P(A)P(S) 


and A and B are non-trivial events. 

Therefore a non-purely discrete probability space (the measure P 
has a ‘continuous’ part) cannot be totally dependent. 

Notice that the examples ofBemstein, Bohlmann and their 
inverses (Example 3.2) are purely discrete. They all can be realized 
on probability spaces which are non-purely discrete, that is, on 
spaces with at least a partially ‘continuous’ part. 


4.2. On the Borel-Cantelli lemma and its corollaries 

Let {A,, n = 1} be a sequence of events in the probability space 
(Q, F, P). Define the event A* = ()~_, Up_,, Ax. Then A’ = {A, 
1.0.}: that is, infinitely many A, occur (1.0. means infinitely often). 


The following result (the Borel-Cantelli lemma) can be found in 
almost all textbooks on probability theory. 


Afi= ] 


(a) If 30, P(A,) < 00, then P[A, i.o.] = 0. 
(b) If 5°, P(A,) = 00 and Aj, Ag,... are independent, then P[A,, i.o.] = 1. 


We show by an example that in general the converse of (a) is not 


true, and that the independence condition in (b) 1s essential. 

Let Q = [0, 1], ¥ = Bro, 1] and let P be the Lebesgue measure. 
Consider the following sequence of events: A, =[0, 1/n],n = 1, 2, . 
: Then obviously we have 
A, |innasn — oo, [Ap i.o.] = (V7, An = @, so that P[A,, 1.0.] 
= 0. However, }°™ , P(A,) = SO, (1/n) = on. It follows that 
the converse of (a) is not true. Looking at (b), we see that the 
condition )>~~, P(An) = © does not imply that P[A, i.o.] = 1 
and thus the independence of 41, A2,... 1S essential. 


4.3. When can a set of events be both exhaustive and 
independent? 


Let (Q, ¥, P) be a probability space and {A;, i € A} a non-empty 


set of events. (A denotes some non-empty index set.) This set 1s 
called independent if for any k > 2 and any subset Aj, ... , Aj, 


where 7j,...,7,€ A, 


The set is called exhaustive if 


P (U 4 = 1. 
ze 


The following question arises naturally: is it possible for the set 
{A;} to be both exhaustive and independent? The answer will be 


given for two cases: when the set {4;} consists of a finite or of an 
infinite number of events. 


(i) Let the index set a be finite. Suppose {4;, i € A} is an 
independent set. Then so is the set { A;,i € A}, and 


(1) P (U 4, —1-P (a a) =1-[]P(49. 
rE A ie A 1EA 

Obviously, if for alli € A, P(A) > 0, then the right-hand side of 
(1) becomes less than 1 and {A;, i € A} cannot be exhaustive. 
However, if for some 7 we have P(A‘) = 0, this means that P(4;) = 
1 and A; = Q. Therefore in this trivial case only (compare Example 
4.1) the finite set of independent events can be exhaustive. Of 
course, a finite set {A4;, i © a} can be exhaustive without being 
independent. 


(ii) Here the index set A = N. We shall construct two different sets 
of independent events such that one of them is exhaustive and the 
other is not. 

Choose at random a number x € [0, 1]. Let A; be the event that 
the ith bit in the binary expansion of x is zero. It is easy to check 
that A), Az, .. . are independent and moreover P(4;) = 4 for each i. 
Thus $*>~, P(A;) = oo and, according to the Borel-Cantelh 


i=1 
Ienrnia (see Example 4.2), P(J, 4;) = 1. Hence the set {A;, i > 
1! is both independent and exhaustive. 


Consider now another set {8;, i> 1} defined by 


B,=()A;  wherer = $i(—-1) +1, 8 = $i(t +1). 


ar 


B, 1s the event that the first bit in the binary expansion of x is zero, 
B> that the second and the third bits are zero, B3 that the next three 


bits are zero, and so on. Since P(5;) = 27 we have 


oP B= coand Pil [1B ) <1 Hence BL > 1} is :a: set 
of independent events which, however, is not exhaustive. 


4.4. How are independence and exchangeability related? 


Let US consider a finite collection of random events 4, = {A],... 
, A,}, n = 2 in probability spaces. 4, 1s said to be exchangeable 
(also symmetrically dependent) if the probability P(4;] .. . A;;) 1s 
the same for all possible choices of k events, k>1,1 <i) <in<... 
< iz <n. In other words, there are numbers p], p2,..., Dj—], all in 
(O, 1), such that 


P(A;) = p; forall 7; P(A;A;) = pe forall i < j; 
P(A;A; A) = p3 forall i <j < Lete. 


Like the independence property we can introduce the term 
exchangeability at level k for a fixed k meaning that P(4; .. . 4;,) 
is the same for all choices of just & events from 4, regardless of 


what happens at levels higher than &, and lower than k. It turns out 
the collection 4, can be such that exchangeability property does 


not hold for others. Thus 4, is totally exchangeable (or simply 


exchangeable) if it obeys this property at all levels k,k=1,2,..., 
n— | (for k=n we have only one event, namely 4] A2... Ay). 


It is easy to see that if 4, 1s exchangeable at level | (P(A]) =... 
= P(A,,) = p1) and A, 1s mutually independent, then obviously 4, 


1S totally exchangeable (now 
P(A;,A;) = o?, alli < j; P(A;A;A,) = ne lla 9 oe dete). If, 
however, A, 1S mutually independent but there are different 
numbers among P(A), ..., P(4,,), then 4, 1s not exchangeable at 
all. 


We can return back to Example 3.5 and derive additional 
conclusions about the validity of the exchangeability property 


(total or partial, only at some levels). 

Let us turn to another example. 

Suppose we have at our disposal 192 cards on which in a special 
way numbers are written such that: 110 cards are marked by a 
‘triplet’ (each of 123, 124, 125, 134, 135, 145, 234, 235, 245, 345 
is written 11 times); 30 cards are marked by a ‘quartet’ (each of 
1234, 1235, 1245, 1345, 2345 is written six times); six cards are 
marked by the ‘quintet’ 12345; the remaining 46 cards are blank. 
All 192 cards are put into a box and well mixed. We are interested 
in the following five events: 


= {randomly chosen card contains the number 7}, i = 1, 2, 3, 4, 
>, 


It is easy to check that for all possible indices i, 7, /, s we have: 


P(A) = P( Ao) = P(A3) = de danas dle 
P(A;A;) =H, i<j; P(Aj;AjA1) = i<j <l; 
P(A; A; AAs) = 4G, i<j <l<s; P(A, AgA3AqA5) = 3. 
Thus we arrive at the following two conclusions for these five 
events, namely: (a) they are dependent at level 2, dependent at 
level 3, independent atlevel 4 and independent at level 5; (b) they 
are totally exchangeable. 
The final conclusion is that these two properties of random 
events, independence and exchangeability, are not related. 


4.5. A sequence of random events which is stable but not 
mixing 
Let (Q, ¥, P) be a probability space and {4;,1> 1}, A, © F¥ a 


sequence of events such that for every B € F, 


lim P(A,,B) = AP(B) 


n+ 00 


where A is a constant not depending on B, 0< <1. Then {4,} is 
called a mixing sequence with density i (see Rényi 1970). In this 
case it is usual to speak about mixing 1n the sense of ergodic theory 
(see Doukhan 1994). 

The mixing property can be extended as follows. The sequence 
{A,} 1s called a stable sequence of events if for any B € ¥ the 
following limit exists 

lim P(A, B) = Q(B). 


Th OO 


According to Reéenyi (1970), Q is a measure on ¥ which 1s 


"Mla 


absolutely continuous with respect to P. The Radon-Nikodym 
derivative dQ/dP = oa(@) exists and for every B € ¥, Q(B) = 
{3 «(w) dP. Here 0 < a(w) < 1 with probability 1. The r.v. a is 
called a (local) density of the stable sequence {4,}. 


If a = 4 = constant a.s., 0 < A < 1, clearly the stable sequence 
{A,} 1s mixing and has density 4. However, if @ is not a constant, 


the stable sequence {A,} cannot be mixing. Let us illustrate this 
statement by an example. 

In the probability space (Q, ¥, P) let By <¢ F, 0 < P(B,) < | and 
By = By;. Consider two spaces, (Q, F, P;) and (Q, F, Po) where 


P, (A) = P(A|B,), P2(A) = P(A| Bo) foreach A «€ F. 


Suppose that {A‘} is a mixing sequence in (Q, F, P)) with 
density A, and { A‘’} a mixing sequence in (Q, ¥, P2) with density 
hg where 0 < Ay < Ag < 1. Put 4, = At By A” By, Then for 
every B € ¥ we have 


and hence 


lim P(A,B) = Q(B) where Q(B) = A,P(BB,) + A2P(BB2). 
Th De 


Define the r.v. a = a(@) as follows: a(@) = Ay 1f w € By, and 
a(@) =A 1f @ € By. Then 


Q(B) = | a(w)aP 


It follows that the sequence {A,} is stable but not mixing, since 


its density 1s not constant but takes two different values with 
positive probabilities. 

As noted by Rényi (1970), in a similar way we can construct a 
stable sequence of events such that its density has an arbitrarily 
prescribed discrete distribution. 


Chapter 2 


Random Variables and Basic Characteristics 


Courtesy of Professor A. T. Fomenko of Moscow University 


SECTION 5. DISTRIBUTION FUNCTIONS OF RANDOM 
VARIABLES 


Let F(x), x € r! be a function satisfying the conditions: 


(a) Fis non-decreasing, that is xj < x2 => F(x1) < F(x); 

(b) F is right-continuous and has left-hand limits at each x € Ri, 
that is F(x+) = lim, |, F(u) = Fa); 

(c) lim,—,—o5 F(x) = 0, lim,_,.5 F(x) = 1. 

Any function F’ satisfying conditions (a), (b) and (c) above is 
said to be a distribution function (d.f.). 

Now let (2, ¥, P) be a probability space. Denote by B! the Borel 
o-field of the real line rR! = (—oo, 0). Recall that any measurable 
function X : X : (Q,F) (R',B°) is called a random variable 
(r.v.). By the equality Py(B) = P (Xx 1(B)), Be ®! we define a 
probability measure on 8. Using the properties of the probability P 
(see the introductory notes to Section 3 we can easily show that the 
function 


Fx (x) = Px((—co,2]), 2 €R’ 
satisfies the above conditions (a), (b) and (c) and hence F'y is a d.f. 
In such a case we say that F'y is the d.f. of the r.v. X. 

If there 1s a countable set of numbers x1, x2,... (finite or infinite) 
such that Fx (r,) — Fx(tn—) = Pn > 0, >), Pn = 1, then the 
d.f. Fy is called discrete. The probability measure Py is also 
discrete, that is Py 1s concentrated at the points x1, x2,..., called 
atoms, and Py({xy}) = F(X) — Fx%p-) > 0, So) Px (fan}) = 1. 
The set {pl,p2,...} 1s called a discrete probability distribution and 
X a discrete r.v. with values in the set {x,, X,...} and with a 


distribution {pl,p2,...$. Clearly PLX =x,]|=p,, n= 1, 2..... 
The d.f. Fy is said to be absolutely continuous if there 1s a non- 


negative and integrable function f(x), x € r! such that 


it 
Peel = / f(u) du forall z € R’. 
— 00 


Here f 1s called a probability densityfunction (simply, density) of 
the d.f. F'y as well as of the r.v. X and of the measure Py. 

Let us note that there are measures whose d.f.s are continuous 
but have their points of increase on sets of zero Lebesgue measure. 
Such measures and distributions are called singular. They will not 
be treated here. The interested reader is referred to the books by 
Feller (1971), Rao (1984), Billingsley (1995) and Shiryaev (1995). 

Now we shall define the multi-dimensional d.f. For the n- 
dimensional random vector (X},..., X;,), consider the function 


G(271. eee Lan } — P| X, < 4 ee . < Ty\; (oa: eames ty) e R". 


It is easy to derive the following properties of G: 


(a,) G(x],...,.X,) 1s non-decreasing 1n each of its arguments; 
(b;) G(x}],...,x,,) 18 right-continuous in each of its arguments; 
(cj) G(X],....%,) — 0 as x; — —co for at least one /; 
G(X],.--5 Xn) — las Xj —> © for all i = 1,...,n; 
(d) if a; < bj, i= 1,...,n and 
Ag,,b,GO1,---*n) 
= GO AO et Oe ee) 


then 
ps, by ie bs io #& Pa Be. (7 (a 1 | oe + 5 7) = 0. 


Any function G satisfying conditions (a1), (b;), (cj) and (d) 1s 
called an n-dimensional d.f. Actually G is the d.f. of the given 
random vector. 

Analogously to the one-dimensional case we can define the 
notion of a discrete multi-dimensional d.f. Further, we say that the 
d.f. G and the random vector (X},...,X,,) are absolutely continuous 


with density 2(X],....X,), (X],-...X%,) € R” if 


ia “Dy 
Gr. ree Ln) = / ahd / g(uy. ere Un) Oty «Cty 


*, — {x} 


for all (x1,...,.x,) € R”. Here g is non-negative and its integral over 
R” is 1. 

The marginal d.f. G(x 1) = PLX} < xj] 1s obtained from G in an 
obvious way, putting x; = 0 for 7 #7. If we integrate g(x7,..., X,) n 


/ , we obtain the 


the arguments X},..., Xj-], Xj+],..-. X, each in R 
function g(x;) which is the marginal density of X;, i= 1.,..., 1. 


We say that the r.v.s X7 and_X9 are independent if 
PX; EB,,X2€ Bg = P| X; = B,|P|X2 = Bg) 


for any Borel sets B; and Bo (that is, By Bo € B}), Analogously to 


the case of random events we can introduce the notion of mutual 
and pairwise independence of a finite collection of r.v.s. If 
X1,..5Ay, are n r.v.s with d.f.s F’,...,F), respectively, and F(x7,..., 


X,) 1s theirjoint d.f., then these variables are mutually independent 
(simply independent) iff 


| | . ile 
Pi iiccnegth ry CE ack las alesse 


In terms of the corresponding densities the independence of the 
r.v.S X],..., X, 18 expressed in the form 


of Bay tees ia) = hh (21) ae. init): L1ys1+y9 hn © R’. 


Let us now define the unimodality property for an absolutely 
continuous d.f. F: F(x), X € r! is said to be unimodal with its mode 


(or vertex) at the point x9 € ri if F(x) 1s convex for x < xg and 


concave for x > xo. 


For a detailed description of the properties of one-dimensional 
and multidimensional d.f.s we refer the reader to the works 
ofFeller (1971), Chow and Teicher (1978), Laha and Rohatgi 
(1979) and Shiryaev (1995). 


5.1. Equivalent random variables are identically 
distributed but the converse is not true 


Consider two r.v.s X and Y on the same probability space (Q, F, P) 
and suppose they are equivalent, that is P{w:X(@) # ae 0 
Hence 


Fy (x) = Plw:X(w) < 2} = Plw:Y(w) < x} = F(z) foreach sz € R‘. 


Thus X and Y are identically distributed. In such a case we use the 
following notation: y 2 y 

To see that the converse is not true, take the r.v. X which is 
absolutely continuous and synmietric with respect to the origin. Let 
Y = —X. Then the synmietry of X implies that Fy = Fy. Further, as 


a consequence of the absolute continuity of X, we obtain 
Pfu: X(w) = ¥(w)} = Plw: X(w) = w)} = Plw : X(w) = 0} = 0. 


Therefore y © y_ however X and Y are not equivalent. 

The same conclusion can be drawn if_X is any discrete r.v. which 
is symmetric with respect to 0 and such that PLY = 0} < 1. (The 
last condition excludes the trivial case.) This means that_X takes a 
finite or infinite number of values ..., —x2, —x1, x9 = 0, x1, X9,... 
with probabilities po = P{X = 0} < 1,p; = 
Pix = ee =P = —Z£5}, Ss Ue Pees eo 5 pj = 1. 


5.2. IfxX, Y, Z are random vanauies on the same 
probability space, then X = “ Y does not always imply 


thatXZ 2 YZ 


Let X and Y be r.v.s (defined, perhaps, on different probability 


spaces). It is well known that if X “Y¥ and (x), x € r! is a Bl. 


measurable function, then g(X) and g(Y) are also r.v.s and 
g(X) 2 g(Y). This fact could suggest the following conjecture. If 


X, Y and Z are defined on the same probability space, then 
= d 7 
A=YSoAZ=724 foray iy, Z. 


A simple example will show that in general this is not true. Let 
the r.v. XY have a symmetric distribution and let Y = —X. Then 
X £Y. Now take Z = Y, that 1s Z = —X. Then the equality 
XZLYZ is impossible because XZ = ~X? and YZ = (—X)(—X) = 
X?. It suffices to note that all values of XZ are non-positive while 


those of YZ are non-negative. The trivial case PX = 0} = 1 1s of no 
interest. 


5.3. Different distributions can be transformed by 
different functions to the same distribution 


l 


Suppose € is ar.v. in r! and 21 (x) # g2 (x), x € R° are Borel- 


measurable functions. Then g1(&) and g9(é) are r.v.s with different 


d 
distributions, i.e. 91(&) #92(&) (except trivial cases involving 
symmetry-type properties). 


Further, if ¢; and €) are r.v.s and g(x), x € r! is a Borel- 


d dd 
measurable function, then 61 # &2 implies that g(f1) # g(a) (again 
except easy cases). 
These two facts make the possibility to describe explicitly two 
r.v.s 6] and €5 and two Borel-measurable functions g; and g> such 
d | ew ae Hl | 
that 1 #&2, 91 F 925 but 91(S1) = 92(k2) interesting. The multi- 


dimensional case 1s also of interest. 

(i) Consider the r.v. £; ~ N(0,4), normally distributed with zero 
mean and variance + and the r.v. €9 ~ y(a), a > 0. 1.e. €9 has a 
gamma distribution with density (1/T(a))x® le, if X > 0 and 0 
otherwise. Take also the functions g1(x) = |x? and g9(x) = xP, xe 


rz! where p>0Oand f > 0 are arbitrary numbers. Let us see how the 


two TVs NY = wie, = er? and 
m = gi(€1) = |éi|° and no = go(fo) = |fo|® = & are 
connected. If f; and fo are the densities of 7; and 79 respectively, 
we find that fy (x) = fo (x) = 0 1f x < 0 and, for x > 0, 


' = ilfp-l . = 2/ a : a/f—-1l ws Liat 
Ailz) =: are exp(—@!"), fela} = Bra)” exp(—a2'/"). 


Now let us keep p fixed and take a = 4+ and f = p/2. Hence 
E; ~ N(O0, +) as before, ¢9 ~ y(4) and taking into account that 
[(4) = x and comparing f} and f2, we conclude that the r.v.s 7] 
and 77 have the same distribution. 

Therefore two different r.v.s | ~ N(0,4) and & ~ y(S) can be 
transformed by different functions to identically distributed r.v.s: 


£;/? “ f2|°/7, p > 0. 


(ii) Here is another case involving more than two variables. Take 
three r.v.s €, 7 and 0 where € ~ N(0,1), 7 ~ Exp(1) (exponential 
distribution with parameter 1) and 0 follow the arcsine law, that 1s, 
the density of 0 1s 1/(m\/a(1—)) on (0,1) and 0 otherwise. 


Consider now three new r.v.s, namely 
02(567), log(n), log(é) 


and denote by vq, w2, w3 their ch.f.s, respectively. Then 


vi(t) =T(4 + it)/ Vr, yo(t) =T(1 + ét), vs(t) =P(1 + it) (4 $+ it) / Vo. 


By using the obvious identity w)(Hwo(t) = w3(t) for all ¢ and 


assuming that 7 and @ are independent, we easily arrive at the 
relation 


9 d 


ES S20. 


Note that the same conclusion can be derived directly by writing 


where € ~ N(0,1) is independent of ¢ and showing that the two 


factors (€24€) and ¢?/(¢? + €?) are independent and follow 
exponential and arcsine laws respectively. 

The final remark is that all r.v.s considered 1n cases (1) and (11) 
are absolutely continuous. Similar examples for discrete variables 
can also be constructed. This 1s left to the reader (try to avoid some 


trivial cases). 


5.4. A function which is a metric on the space of 
distributions but not on the space of random 
variables 


Let us define the distance r(X, Y) between any two r.v.s_X and Y by 


r(X,Y) =sup|P{X < 2}—P{Y < 2}| =sup|Fx(z)— Fy(z)|, 2 € R’. 


(7 is called the uniform distance or Kolmogorov distance). 
Another suitable notation for r(X, Y) is ry, Fy), where Fy and 


Fy are d.f.s of X and Y respectively. The function 7, considered on 


the space of all distribution functions on r! is a metric. Indeed, it is 
easy to see that: (1) r(X, Y) > 0 and r(X, Y) = 0 iff Fy = Fy; (1) r(x, 


Y) = r(Y, X); (Gil) r(X% Y) < r(X% Z) + r(Z, Y). In (i)-(aii), X, Y and Z 
are arbitrary r.v.s. 

Suppose now that the function 7 is considered on the space of 
the r.v.s on the underlying probability space. Then, referring to 
Example 5.1 above, we conclude that r is not a metric because it 
violates the condition that r(X, Y) =O implies y = y. 


5.5. On the n-dimensional distribution functions 


Comparing the definitions of one-dimensional and multi- 
dimensional d.f.s, we see that in the one-dimensional case 
condition (d) is implied by (a;). However, if n > 1 then (d) is no 
longer a consequence of (a;) and even for 1 = 2 it is easy to 
construct a function F(x, y) satisfying (aj), (b;) and (c,) but not 
(d). For example, take the function 


eo 0 fz = Vorety = lory= 0 
ta Be SF | ae a tS — 
ty) 1. otherwise. 


Obviously (a1), (b;) and (c1) are satisfied. Suppose F is a d.f. of 
some random vector, say (X, Y). Then for every parallelepiped O = 
[a1, 52] < [ag, bo] (here a rectangle) we might have P{(X, Y) € Q} 
> 0. However, if O = [4, 1] x [4, 1] then 


P{(X,Y) € Q} = F(1,1) — F(1, 3) — F(§,1) + F(g, 3) =-1 


which is impossible. Therefore conditions (a;), (b;) and (cj) are 


not sufficient for F' to be a d.f. in the n-dimensional case when n > 
2. 
Let US suggest one additional example. Define 


Geoi= (), ia Tory <0 
Be min{1l,max{az,y}}, otherwise. 


It can be checked that G satisfies conditions (aj), (b;) and (c1) but 
not (d). It is sufficient here to take the rectangle R = [x], x9] < [y1, 
y2|, where 0 < xy < yj <x2 <9 < 1 and calculate the probability 
P[(X, Y) € R]. 


5.6. The continuity property of one-dimensional 
distributions may fail in the multi-dimensional case 


Let X be a r.v. on the probability space (Q,4,P) and F its c.f. 


Suppose the values of X fill some interval (finite or infinite) in R! 
and for each x of this interval, P[w:X(w) = x| = 0, that is the 
probability measure P has no atoms. Then F(x) is continuous in x 
everywhere in this interval. Thus we come naturally to the 
following question: does an analogous property hold in the multi- 
dimensional case? By an example for n = 2 we show that in 
general the answer is negative. Indeed, consider the following 
function in the plane: 


ty, if O<2<10<y<i 
ifO0<a<li<y< ae 

F(z, y) = y tf Tee pe 00, 0 el ee 
be A a eet) Bend 


0), otherwise. 


It is easy to check that Fis a two-dimensional d.f. Denote by (X, Y) 
the random vector whose d.f. is /’. We shall also use the notation in 


figure 1, where we have indicated the domains in Rr and the 
corresponding values of F’. Note that the vector CX, Y) takes values 
in the quadrant O = {(x, y): 0<x<o, 0<y< oo}, and moreover 
each point (x, y) € QO has zero probability. Following the one- 
dimensional case we could expect that F(x, y) is continuous 
everywhere in Q. But this conjecture is false. Indeed, it 1s easily 
seen that every point with coordinates (1, y) where + <y<oisa 


discontinuity point of F. If AC, y) := FC, vy) — F(1 — 0, y — 0) is the 
size of the jump in F at the point (1, y), we find that AQ, vy) =y- 4 
for $<y <1 and AC, y) =4 for 1 < y < ~. The reason for the 
existence of this discontinuity of the d.f. F is that there is a 
‘hyperplane’ with strongly positive probability, namely PLY = 1, + 
< Y< 1] =4 (see the bold vertical segment in Figure 1). 
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Figure I 


5.7. On the absolute continuity of the distribution of a 
random vector and of its components 

Consider for simplicity the two-dimensional case. Suppose (Xq, 

X) has an absolutely continuous distribution. Then it 1s easy to see 

that each of X, and X > also has an absolutely continuous 

distribution. The question now is whether the converse is true. To 


see this, take X] to be absolutely continuous and let X> =X}, that is 
X7(0) = X1(@) for each wm ©€ Q. Evidently X> is absolutely 
continuous. Suppose the vector (Xj, X 2) has an absolutely 


continuous distribution with some density, say f. Then the 
following relation would hold: 


(1) Pi(X,,X%2)€ Bh = /| f(xv1,@2)dx,;dxr2 for any set B € R*. 
'B 


However, all values of the vector (X7, 7) belong to the line / : x9 
= x1. If we take B= /= {(x], x0) : Xo = x 1} then the left-hand side 


of (1) is 1, but the right-hand side 1s 0 since the line / has a plane 
measure (0. Hence (X], X>) 1s not absolutely continuously 


distributed, but each of its components 1s. 
Note that if X, and XX) are independent and absolutely 


continuous, then (X17, X93) 1s also absolutely continuous. 


5.8. There are infinitely many multi-dimensional 

probability distributions with given marginals 
If the random vector (Xq,..., X;,) has a d.f. F(x1,..., x,) then the 
marginal distributions F}(xj,) = P|X; < xz], k = 1,... ,2 are uniquely 
determined. By a few examples we show that the converse is not 
true. It is sufficient here just to consider the two-dimensional case. 
The examples treat the discrete, absolutely continuous and the 
general cases. 
(i) Let p = tpg ot, 7 = 1, 2,...5 be a two-dimensional discrete 
distribution. Select two points, say (x1, y1) and (x1, x2), each with 
positive probability and such that xj #x9, yj #2. We can choose 
a small é satisfying the relations 0 < é < pj], and O < € < p92. 
Consider the set g = (qj,1, 7 = 1, 2,...} defined as follows: 


911 = Pil — €, G12 = Pig FE, Gai = Pai +E, Goo = Poo — € 


and for all i,j # 1,2, we put qj; = pj. Then it 1s easy to check that g 


is a two-dimensional distribution. Moreover, the marginal 
distributions of g are the same as those of p for each € as chosen 
above, even though p # q. 


(ii) Consider the following two functions: 


otherwise, 


= 


f (x11, 22) = <(1 =F 1122), if —1 < Ly = 1,-l = Ly 1 


; S tole a eae et 
g(a iE £2) = lL ; 
QO. otherwise. 


Then f and g are both two-dimensional probability density 
functions. For the marginal densities we find 


i =. ot my - if — | = 
0, otherwise, 0. otherwise, 
“fp Cc (a - , 
| 2 rs f bey ee oes 
g(a) = ¢ 2? if —] 7 ty | gi Ue if —] 7 T9 | 
7 0, otherwise, | O, otherwise 


thus f) = g] and fo = go. but obviously fF g. 


(iii) Here is another specific but interesting example. For arbitrary 
positive constants a, b, c consider the functions 


T(atbt+e) _a—-1,b-1/ : \c—-1 Be Pit ede wie 

faa Pajribrie) * ye (l—-a2—-yy", TOS ay, t+ye 1 
0. otherwise 
and 
f(x,y) 

| ies A\b+c— j Le ae eee ee S 

si Baroni «6 er a ae, Sa ysl 
0) otherwise, 


Here I(-) and B(, -) are the well known gamma and beta functions 
of Euler. Both f and f are two-dimensional probability density 


functions. Note that f is the density of the so-called Dirichlet 
distribution. Denote by (X, Y) and (X, Y) the random vectors 
whose densities are fand f respectively. Direct computations show 
that X and X have beta distribution with parameters (a, b + c) and 
Y and Y have beta distribution with parameters (b, a + c). Thus, 
again, the marginal densities of the vectors (X, Y) and (X, Y) are 
identical, but obviously fF f. 


(iv) Suppose /] and fF are d.f.s obeying densities fl and /f2 
respectively. Consider the function 


f(a1,22) = falar) fo(w2)[1 + e(2Fi(a1) — 1)(2F2(a2) —1)), (a1, 22) € R? 


where € is an arbitrary number, |e] < 1. Then fis a density function 
and for each ¢ the marginal densities are f] and f> respectively. 


The answer to the question formulated at the beginning of this 
example can be also given in terms of the d.f.s only. Indeed, let Fy, 


F> be any d.f.s and e¢ any real number, |e] < 1. Then by direct 
computation we see that 


F(x), 22) = F\(x1)F2(x2)[1 + (1 — Fi(vi))(1 — Fa(x2))], (a1, 72) € R° 
is a two-dimensional d.f. whose marginals are just Fy) and Fp. 
(v) Let F and G be arbitrary d.f-s. in r! Define 

Hy (x,y) = max{0, F(a) +G(y)— 1},  Ho(a,y) = min{ F(x), G(y)}- 
For any cj, c2 = 0 with cj + cz = | let 

A(x, y) = c, Ay (2, y) + cofle(2, y). 


Then it is not difficult to check that H(x, y), (x, y) € R? are two- 
dimensional d.f.s (Fréchet distributions). Moreover, any d.f. of this 
class has F and G as its marginals. 


Hence there are infinitely many multi-dimensional distributions 
with the same marginal distributions. In other words, the marginal 
distributions do not uniquely determine the corresponding joint 
distribution. The only exception is the case when the random 
vector consists of independent components. In this case, the joint 
distribution equals the product of the marginals. 


5.9. The continuity of a two-dimensional probability 
density does not imply that the marginal densities 
are continuous 


Let f(x, v), (x, y) € Rr? be a probability density function which is 
continuous. Denote by /f](x), x € r! and hoy), v € r! the 


corresponding marginal densities. There are problems which 
require the use of the marginal densities fj and fo and their 
continuity properties. Intuitively we might expect that f) and f> are 
continuous if fis. However, such a conjecture is not generally true. 
Indeed, consider the following function: 

(1) f(a. y) = exp(—|ax| — 5a7y), (x,y) € R?. 


2V 20 


It is easy to check that fis a probability density function. For the 
first marginal density f] we find 


| _ (). if z 
(2) fi(z) = | Pexp(-lal, ir ae 
Clearly f; 1s discontinuous at x = 0 despite the fact that f is 
continuous. 

Notice that the marginal density f] 1s discontinuous at one point 
only. Now, using (1), we construct a new probability density 
function which will be continuous, but one of whose marginal 


densities will have infinitely many points of discontinuity. 
Let r], 7,... be rational numbers in some order and let 


- - . el, 
(3) g(x,y) = x, poet (UC — Tn, Y ye 
n=] © 
Since f given by (1) 1s bounded on Rr, the series on the right- 


hand side of (3) is uniformly convergent on Re. Moreover, g 1S a 
probability density function which is everywhere continuous. The 
marginal density g] of gis 


Cx 


(4) g(x) = do shit — rn) 


oe 


with f; given by (2). The boundedness of f; implies the uniform 
convergence of (4). It follows from (4) that g; 1s discontinuous at 
all rational points 7;, 79,..., though it 1s continuous at every 


irrational point of rR) 


5.10. The convolution of a unimodal probability density 
function with itself is not always unimodal 


We present two examples and then discuss them briefly. 


(i) Consider the following function: 


, i=. 
— 6 


It 1s easy to see that f is a probability density function which 1s 


unimodal. Direct calculation shows that the convolution a 2(x) = ( 


* f)(x) 1S: 


ate. > 5 
0) if ¢<—ye ands > = 
2 2 oth ee = ath 
Zor +3, if —FR< LS —3 
-l5a+4, if -t<2xr<0 
aD ¢ ; 
eo 
r+ : if tees 
a) 
Or + = if sy <2 
a= 2 ae 
3 5 
Obviously _f*2 has two local maxima at 
a a ee ee ee eee #2 (4) _ 17 
- 39 and = 5. f"(—35) = G L(G) =7 and 


minimum equal to + at the point x = 0. 

Hence the convolution operation does not preserve unimodality. 
(ii) Suppose a and 6 are positive numbers. Denote by uw, and v, the 
densities of uniform distributions on (0, a) and (-4a, 4a) 
respectively. Let f= 4(ug + up), g = 4(Vq + vp). Then each of f and 
g 1s a unimodal density and, moreover, g is symmetric. We want to 


know whether the convolution f * g is unimodal. To see this we use 
the equality 


eo + | (Ue * Ua) + (Up * Vp) + (Ua * Up) + (Uy * Ua)]. 


Considering separately each of the terms on the right-hand side of 
this representation we arrive at the following conclusions: 


(1) ug * Vq linearly decreases on (4a, 4a) with slope (-a~*) and 
vanishes on (3a, 0); 

(2) up * vp linearly increases on (—4b, $b) with slope b-?; 

(3) ug * vp is constant on (a — $b, 4b) 

(4) up * vg is constant on (4a, b — 4a) and then decreases linearly. 


Now choose the parameters a, b such that b > 3a. From (1)-(4) 
it follows that f * g is decreasing in the interval (4a, 3a) and is 
increasing in (3a, 4b), but this means that f* g is not unimodal. 

Let us note that in case (1) the density f is unimodal but not 
symmetric, while in case (11) both densities fand g are unimodal, g 
is symmetric and f is not symmetric. We have seen that the 
convolutions f * f and f * g are not unimodal. Thus in general the 
convolution operation does not preserve the unimodality property. 
Note that if f and g are unimodal densities and both are symmetric 
then their convolution f * g is unimodal (Lukacs_ 1970; 
Dharmadhikari and Joag-Dev 1988). 


5.11. The convolution of unimodal discrete distributions 
is not always unimodal 


Recall first the definition of unimodality. Let P = {p,,n € No} 
be a probability distribution on the set of the non-negative integer 
numbers No or on some subset (or even on a countable subset of R 


1). We say that ‘P 1s unimodal if there 1s an integer kg such that p; 
is non-decreasing for k < kg and non-increasing for k > ko. The 
value kg is called a mode. We wish to know if the unimodal 


property 1s preserved under the convolution operation. Example 
5.9 shows that the answer is negative in the absolutely continuous 
case. Let us find the answer in the discrete case. 

Consider two independent r.v.s., say ¢ and 7 with values in the 
sets {0,1,..., m} and {0, l.,..., 1} respectively. For the probabilities 
pj = P[¢ =i] and g; = P[y =] we suppose that 

m+ 2 | n+2 | 


? Pi=..-=—Pm=-F ‘ > for=F 4 } q—...— n=; ft P 
2m + 2 °2n +3 Pn + ? 


fo = 


2m + ? 


Then each of the distributions Pg = {p;, i = 0, 1, ..., mj and Py = 
(qj, J = 9, 1, ..., m} is unimodal. The sum 0 = ¢ + 7 is a r.v. with 


values in the set {0,1,..., m+n} and its distribution Pg = {r;, k = 0, 
1, ..., m+n}, in view of the independence of € and 7, 1s equal to 
the convolution of Pz and Py: Pg = Pe * Py. This means that 


ro = Pi@=kl =Plé+n=&/| = S pg; k=0.1..... m+n 
k | “ / Pi4dj 


where the summation is over all i € {0, 1, ..., m} and all 7 € {0, 1, 

..., Nt with i+ 7 =k. In particular we can easily find that 

(m + 2)(n + 2) mt+n+4 

————, fi eit r pie a a a 

(2m +2)(2n+2)’ ° ee eee (21% + 2)(21 + 2) 
mt+nm+s 


rs = Pio Nd) +h, OOO. CIC. 
2 Pog + Pigi + P2qo (2m + 2)(2n +2) 


ro = Po + Go = 


Comparing 7p, 7; and 72 we see that rg > ry but r] < ro. Even 


without additional calculations this is enough to conclude that the 
distribution Pg, that 1s the convolution Pe * Py, 1s not unimodal 


even though both Pz and P, are unimodal. 


5.12. Strong unimodality is a stronger property than the 
usual unimodality 


The d.f. G is called strongly unimodal if the convolution G * F 1s 
unimodal for every unimodal F’. (This notion was introduced by I. 
Ibragimov in 1956.) 

Note that several useful distributions are indeed strongly 


unimodal: the normal distribution N(a, 5°) the uniform distribution 
on the interval [a, 5]; the gamma distribution with a shape 
parameter a => 1; the beta distribution with parameters (a, 5), a = 1, 
b> 1, etc. 

However we have seen (see Example 5.9) that the convolution 
of two unimodal distributions 1s, in general, not unimodal. This 
implies that strong unimodality 1s a stronger property than (usual) 
unimodality. Obviously, Example 5.9 deals with absolutely 


continuous distributions. Hence it is of interest to consider such a 
case involving discrete distributions. 

Let F;, denote the uniform distribution on the finite set {0, 1, ..., 
k} and let F = 4{Fo + Fin+1) for a fixed m > 3. Then F is unimodal 
and our goal is to look at the convolution G = F * F. The 
distribution G is concentrated on the set {0, 1, 2, ..., 2m —2} and if 
gj, 1 = 0, 1, 2, ..., 2m —2, are the masses of its ‘atoms’, then we 
easily find that 


L Z l 


=5 - = 
tm “, 49g; =2m 


‘ _ : =—3 F = . _9 
49g = 14+ 2m t2m “, 4go = 2m +3am ~. 


It follows immediately that 
Jo -Ji = + (1 — na”) >0, o1—g= + (—m~*) <0. 


Thus g] < min{go, go} and therefore the distribution G = F' * F is 


not unimodal. In other words, F is unimodal but not strongly 
unimodal. 


5.13. Every unimodal distribution has a unimodal 
concentration function, but the converse does not 
hold 


Let X be ar.v. with a d.f. F and wr be the measure on (ri, Bl) 
induced by F’.. Recall that the function 


TEl- 


| | 0, if 1<0 
(1) Qr(L) sup wp({—sl, sij+z), if 1>0 


is said to be a concentrationfunction (of P. Lévy) corresponding to 
F and also to wp. (Here the sum of sets is defined in the usual 
sense: d + B= ta + b: a € A, b © B}.) Important results 
concerning concentration functions and their applications have 


been summarized by Hengartner and Theodorescu (1973). From 
(1) we can easily derive that Of(/), 1 € Rl isadf. 

Let us mention the following result (Hengartner and 
Theodorescu 1973). If F(x), x € r! is a unimodal d.f. with mode x* 
= (), then the concentration function Op(1), / € r! is unimodal with 
mode /* = 0. 

By a concrete example we can show that the converse is not 


always true. We give below the d.f. F and its concentration 
function Of calculated by (1), namely: 


——, 
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Clearly Or is unimodal but F is not unimodal. 


SECTION 6. EXPECTATIONS AND CONDITIONAL 
EXPECTATIONS 


For any r.v. X on a given probability space (2, F, P) we can define 
an important characteristic which is called an expectation, or an 
expected value, and is denoted by EX. If X > 0 and PLX = «] > 0 
we put EX = 00, while if PLY = «| = 0 we define 


i 7) ke 
(1) EX = lim }/ =P 


For an arbitrary r.v. X let ¥" = max {X, 0} and XY = max{—xX, 0}. 
Since X* and X_ are non-negative, their expectations ELX*] and 


ELXY ] can be obtained by (1), and if either ELX"] < 0 or ELY ] < 
oo then 


KX — E|X* | E|X |. 


The expectation EX 1s also called the Lebesgue integral of the ¥- 
measurable function Y with respect to the probability measure P. 
We say that the expectation of X is finite if both ELX"] and E[|X |] 
are finite. Since |X] = X' +X, the finiteness of EX is equivalent to 
E[LX|] < oo. In this case the r.v._X is said to be integrable. If X 1s 
absolutely continuous with a density f, then YX is integrable iff 
fo. |al f(a) da < co and EX = [™ af(x)da. If X is discrete, 
P[X = xy] = Dn, — 0,n = 1,2,...,>), Dn = 1, then YX is 
integrable iff $7), |@n|pn < coand EX = 5° npn. 

For some purposes it is necessary to consider the integral of X 
over the set A e F., In such a_— case 
[,X dP = J, X(w)1la(w) dP). 

It is convenient to introduce here the space L"(2,5,P), or 


simply L’, of all r-integrable r.v.s where r > 0 and X € L’ iff 


E[|X]"] < o. 
In addition to the expectation EX, important characteristics of 
the r.v._X are the numbers (if defined) 


E[(X —c)*], E[|X —ce|*], k=1,2,...,c eR 


which are known as the kth non-central moment and kth non- 
central absolute moment of X about c respectively. If c = EX these 
moments are called central. In this section and later we use the 


notation mj; = ELX*] for the Ath moment of_X. In the particular case 
when & = 2 and c = EX we get the quantity E[CY — EX)7] which 1s 
said to be the variance of X and is denoted by VX : VX = E[(X — 
EX)’]. 

The expectation possesses several properties. We mention here 
only a few of them. If X] and_X are integrable r.v.s and cj, c7 € R 


! then X 1 + X2 and c;X; are also integrable and 


Ele, X1 + co Xe] = c, EX; + coEX2 (linearity), 
EX, <EX. i X, < Xe (monotonicity). 


Other properties such as additivity over disjoint sets and different 
kinds of convergence theorems can be found in the hterature 
(Chung 1974; Chow and Teicher 1978; Laha and Rohatgi 1979; 
Shiryaev 1995). 

The family {X, : n = 1} of r.v.s is said to be uniformly 


integrable if 


sup | X,,|dP(w) + 0 as a—- oo 
/ [| X|>al] 


Th 


or, in another equivalent form, if 


sup E[|.Xn|L)x,,;>a]] 70 as a ov. 

Suppose now that_X 1s ar.v. on the given probability space and 
D is a sub-o-field of $. Following the same steps as for the 
definition of the expectation EX, we can define the conditional 
expectation ELX|)] of the r.v. X with respect to the o-field Y. So, if 
X is an integrable r.v., then ELX|2] is a P-measurable r.v. such that 
for every A € ) we have 


/ E[X 
fA 


The existence of ELX/2], up to equivalence, is a consequence of 
the Radon-Nikodym theorem (Chow and Teicher 1978; Shiryaev 
1995). Hereare some properties of conditional expectations: 


D| dP = / XdP as. 
JA 


(i) if X =e a.s. where c = constant, then ELX 
(11) AY < AQ => EX; pd) E|X2|D| a.S.; 
(iil) if X , and X» are integrable r.v.s and c,, co € R', then 


D = ¢ 4.5." 


E|c,.X1 oT c2X2|D| = c,E|X,|D} Sis C2E| X2|DI cL.S., 


(iv) E{E[X|D)} = EX; 
(v) if D) C Do Cc F, then E{E[X|D2||D)} = ELX|D,] a.s.: 
(vi) if X is independent of the o-field ‘D (that is, .X is independent of J4, A € D), 
then E[X|D] = EX ass.; 
(vii) if X is D-measurable and E|| XY || < oo, then 


E[XY|D] = XE[Y 


D] as. 


Finally, let us mention an important particular case of the 


conditional expectation. For any event A © J the conditional 
expectation E[/,|P)] is denoted by P(A|P) and is called the 


conditional probability of the event A with respect to the o-field DP 
(also see Example 2.4). 

This section includes examples devoted to various properties of 
expectations, conditional expectations and moments (in both one- 
dimensional and multidimensional cases). The Fubini theorem is 
introduced and analysed in Example 6.6, and conditional medians 
are considered in Example 6.10. 


6.1. On the linearity property of expectations 


If one operates with expectations such as ELXY + Y] and ELX + Y + 
Z| it is generally accepted that ELY + Y] = EX + EY and ELX + Y+ 
Z| = EX + EY + EZ. (Analogous relations can be written for more 
than three terms.) This is just the so-called linearity property 
ofexpectations. Its meaning 1s that the value of E[-] depends on the 
variables in [-] only through their marginal distributions. 

Recall that in the case of two r.v.s the linearity holds if ELXY + Y] 


is defined (in the sense that E[(X + Y)*] and/or E[(X + Y)] are 
finite). Of course, if EX and EY both exist then ELXY + Y] exists and 


equals their sum. Moreover, the linearity holds even when EX and 
EY are not defined, or if one of them equals +oo and the other 
equals —oo (Simons 1977). 

Now the question is: what happens if we consider three 
variables? Does the linearity property of expectations still remain 
valid? The answer will follow from the example below. 

Let ¢ denote ar.v. distributed uniformly on [0,1]. Then 1 —&€ and 
yn = |26 —1| have the same distribution as €. Define three new r.v.s, 
say X, Y and Z, in two different ways. 


Case I. X = Y= tan(a¢/2), Z = —-2X. 
Case II. .X’ = tan(¢/2), Y' = tan(z(1 — ¢)/2), Z' = —2tan(27/2). 


It is evident that X =X’, YSY’, Z=Z’. our purpose now is to 
find the expectations ELY + Y+ Z] and ELX’+ Y'+ Z’]. In Case I, X 
+ Y+ Z=0 and hence ELXY + Y + Z] = 0. In Case II we have: Y’ = 
cot(zé/2), Z’ = tan(zé/2) —cot(xé/2) if 0 < € < 4 and Z’ = cot(zé/2) — 
tan(zé/2) if- < €< 1. Thus X’+ Y'+ Z'= Ptan(né/2) = 2X 1f 0<¢é< 
1 and_X’ bag + Z' = 2cot(zé/2) = 2Y if $< €< 1. Hence PLX’+ Y' + 
Z' > 0| = 1. Moreover, it 1s easy to eee that ELX’ + Y’ + Z'] = 
(4/z) log 2. 

Comparing the results from Cases I and II we see that the 
linearity property described above for two r.v.s can fail for three 
variables. Note that if one considers X = X + Yand Y = Z then E[X 
+ Y+Z]=E[X + Y] and ELX + Y], when defined, depends on 
and Y only through the distribution of X and the distribution of Y. 
But X =X + Y and thus the value of the expectation ELY + Y + Z], 
when defined, depends on_X, Y, Z through the bivariate distribution 
of X and Y, and the distribution of Z. 

The reader could try to clarify how the linearity property of 
expectations 1s expressed when considering more than three 
variables. In general we have to be careful when taking 
expectations of even such simple expressions like sums of r.v.s. 


6.2. An integrable sequence of non-negative random 
variables need not have a bounded supremum 

Let {X,, 1 > 1} be a sequence of non-negative r.v.s such that for 

some p > 0, X/' is integrable for each n, and, moreover, let sup, E| 

XP] < a, Then intuitively we could expect that the variables X,, as 

well as sup, X, are bounded. Let us show that such a conjecture 


need not be true. 
Consider the sequence X1, X2, ... of 1.1.d. r.v.s whose common 


d.f. is F(x) = 0 if X < 0 and F(x) = 1 — e * if x > 0 (exponential 
distribution with parameter 1). Then for any p > 0 we have E[X/] 
=[ {p+ 1)<o and thus supp ELX?] < oo. Further, for x > 0 and m 


= ]1,2,... we find 


P| max X;< 2) = (P[A7 < 2)" =e *)™. 


l<j=m 
Passing to the limit in both parameters m and x we get 


lim Pia max Agee | = ree X;<a|=0 forallz >0 


max 


and 


lim P|sup X; < 2] = Plsup Xj; < oo] = 0. 
j 


Too ~— 
a] 


Therefore we have shown that in general the integrability of any 
order p > 0 of members of the sequence {Xp, 1 > 1} does not 


imply boundedness of the supremum of this sequence. 


6.3. A necessary condition which is not sufficient for the 
existence of the first moment 


Let X be ar.v. with d.f. F’. It is well known and easy to check that 
the condition lim,_,., x(1 — F(x)) = 0 is necessary for the existence 


of the expectation EX. Thus we arrive at the inverse question: if F 
is such that x(/ — F(x)) — 0 as x — O, does this imply that EX 
exists? The example below shows that in general the answer 1s 
negative. To see this take the following d.f.: 


Direct reasoning shows that x(1 — F(x)) — 0 as x — 0 while 
J, (1-F(a2)) da=oo and since EX = EX = f[.° (1 — F(2)) da, 
then EX does not exist. 

We can say even more: if E[|X/@] < oc for some a > 0, then 


necessarily n?P[|X] > nn] — 0 as n — © (e.g. see Rohatgi 1976). 

Let us take a = | and illustrate once again that a condition like 
nPLX >n| — 0 asn— ois not significant for the existence of EX. 
Indeed, let us consider the following discrete r.v._X defined by PLX 
=n|= cl(n* log n), n = 2,3,..., c 1S a norming constant. We can then 
show that for large n, PLY > n) ~ c/(n log n) implying that nPLX > 
n] — 0 as n > . However EX should be equal 7 ~ , c/(n log n) 
and the divergence of this series shows that the expectation EX 
does not exist. 


Finally, note that if not OPT |X] >n|— 0 as n — o for some 0 > 
0, then E[|X/a] does exist. 


6.4. A condition which is sufficient but not necessary for 
the existence of moment of order (— 1) of a random 
variable 


The moments of negative orders of r.v.s are used in some 
probabilistic problems and it is of interest to know the conditions 
which ensure their existence. 

If_X is ar.v. with a discrete distribution having positive mass at 


QO, then ELX 1 is infinite. The same holds if X 1s absolutely 


continuous and its density f satisfies the condition f(0) > 0. The 
following useful result is proved by Piegorsch and Casella (1985): 
let _X be ar.v. with density f(x), x € (0, 0) which is continuous and 
satisfies the condition 


(1) lim 2° f(2) = 0 for some a > 0 
r—O0 


then ELX | <0, 


By an example we aim to show that ELX — 1 can be finite even if 
(1) fails: that is, in general, condition (1) 1s sufficient but not 
necessary for the moment of order minus one to be finite. Indeed, 
define the family of functions {f,, n => 1} by 


ri 
(2) fn(x) = |log” x|~’ /| log" ul"'du, O<2r<e 
J0 


where c = constant, c € (0, 1). It is easy to check that for each 1, f, 
is a probability density function of some r.v., say X,. Since 


[ log” ul~' du <oo, lim f,(x)=0, lima ~*f,(x) = 00 

( x0 x0 —_ 

for every a > 0, it follows that (1) 1s not satisfied. It then remains 
for us to determine whether E[X,'| exists. By (2) we find that 
E[X, '|is finite iff 


(3) fe log” a|)~* dx < oo. 

0 
For n = | the integral in (3) diverges for all c € (0, 1), but if n = 2, 
3,..., this integral is finite for any c € (0, 1). Consequently E[X,, | 
< oo iff > 2. So, for n = 2, we have E|.X,'| < © but condition (1) 
does not hold. 


6.5. An absolutely continuous distribution need not be 
symmetric even though all its central odd-order 
moments vanish 


Let F(x), x € Rr! be an absolutely continuous d.f. with a density f- 
Suppose F' is symmetric, that is F(—x) = | — F(x), or, equivalently, 
fi-x) = f(x) for all x € Rl Suppose /’ has moments of all orders. 
Then the central odd-order moments of XY 


LX) 
Mane, = EXT’) = / grt F(x) da 


are zero for all n = 0, 1, ... since the integrand ent fix) is an odd 
function and the integral is taken over the interval (-00, 00), which is 
symmetric with respect to the origin 0. 


Suppose now that the distribution G(x), x € r! has all its central 
odd-order moments vanishing. The question is: does it follow from 
this condition that G is symmetric? The answer is negative as 
illustrated by the following example. 


Let the function g(x), x € r! be defined by 
(1) 


l 


| j1/4¢s i a~ll/4 oe E 
ga)=4 & exp (—|a : (1 + sin |ar| ), ifz<0 
a5 exp (—2'/*(1 — sin x’/*), ifz > 0. 


It 1s easy to verify that g is a probability density function. Denote 
by Y ar.v. with this density. Then we can calculate explicitly the 


moments m, = E[Y”] for each n, n = 0, 1, .... The result is 
Mon41 =0, Mon = E(8n +3). 


Thus all central odd-order moments of Y are zero, but obviously 
the distribution of Y defined by the density (1) is not symmetric. 
(Also see Example 11.12.) 


6.6. A property of the moments of random variables 
which does not have an analogue for random vectors 


Let (X1, ..., X,) be a random vector on a given probability space 
(Q,F,P). Let k}..., k, be non-negative integers. If EX] | cons 


X,)|*] exists, then the number 
Mk1...k&n = E[X a ana had 


is called a (Kl.,..., k,)th. mixed central moment of the random 
vector (X], ..., X,) and k =k] +... +k, 1s its order. 

If n = 1 we have one r.v. X1 only, and it 1s well known that the 
existence of the Ath moment m; implies the existence of all 
moments m; for 0 <7 < k. It suffices to recall the Lyapunov 


inequality (E[|X 1/7) 4 < (E[LX{\*)) 1/4, QO <j <k, or to use the 
elementary inequality Lxl/ =i) ar cl, x € Ri, O <j <k. This 
observation in the one-dimensional case leads to the following 
question: does a similar statement hold in the multi-dimensional 
case? The answer is negative and can be expressed as follows: in 
the case n > | the existence of a moment mg, x does not imply 
the existence of all moments m;_; forO<j;<k,i=1,...,n. To 
see this, take Q = (0,1), F = Bio,1) and P the Lebesgue measure. 
For fixed numbers cy and cj, 0 < cy <c7 < 1, define the following 
r.V.S: 

ae té fO<cw<c 


2 
0. ies <w <1, 


: (), nrOo0<w<e 
X2 >= t =i] - 


fc <w< 


It is easy to check that the product X] - X is integrable, but neither 
X, nor X 1s. Thus the moment m7 7 of the vector (Xj, X2) exists, 
but moj and m1 9 do not exist. Obviously, if cy <2, then m1 > 0 


and if cj =c, then my 1 = 0. 


6.7. On the validity of the Fubini theorem 


Let (Q1, F1, P1) and (Qo, F, P>) be two probability spaces. Then 
there exists only one probability P on the product (Qy x Qo, Fy x 
F5) such that 


P(A, x A») = P| (.A,)P2(Ao), A, © F,, Ao € Fo. 


Z , 


Further, for every non-negative (or quasi-integrable) r.v. _X defined 
on the product space (Qy <x Q2, F1 x F9, P), the following formula 


is both meaningful and valid: 


iit Jo, 5 te P; (dw) u) - Xe, (w)P» (dw) 
) = ees P> (dws ) Jo, a (wy )P; (dw ) 

(for the proof see the books of Gihman and Skorohod (1974/1979) 
and Neveu (1965)). 

Our purpose now is to show that the assumption that | X¥ dP 
exists 1s essential for the validity of (1). Let Z => 0 be a non- 
integrable r.v. on (Q), 41, Pi) and define the variable XY on the 
product of this space with the discrete space {0, 1}, both points 
having equal probabilities, by 


X(w,0) = Zw), X(w,1) = —Z(w). 
Then it is elementary to check that the second equality in (1) 1s 


violated. 


6.8. A non-uniformly integrable family of random 
variables 


Consider the sequence of r.v.s { Xp, 1 = 1} where 


PIX, =2")=2°", P[X, =0)=1-2°-”. 


({X,} arises in the so-called St. Petersburg paradox, see e.g. 
Szekely 1986.) Then the following relation clearly holds: 


Aglare = i aia 
IX,|>a n — )1, ifa<27° 


This means that Jy, +4 |Xn|dP does not tend to zero uniformly 
inn as a — co. However, for each n, X,, 1s integrable since EX,, = 


l. 
Hence {X,} is an integrable but not uniformly integrable family 


of r.v.s. 


6.9. On the relation E[E(X|Y)] = EX 


The definition of the conditional expectation of the r.v. X given 
another r.v. Y or some o-field, requires _X to be integrable: E[|X|] < 
oo. In this case the equality E[ECX|Y)] = EX holds. However, the 
following ‘reasoning’ appears to contradict this result. 

Let Y be a positive r.v. whose density g,(y) 1s given by 


(1) gu(y) = (4v)2"(T(4v))-!y2’-le-2"7,  y > 0, v>0 


(compare this with a gamma distribution). Suppose the conditional 
distribution of X given Y = y is specified for y > 0 by the following 
probability density function: 


(2) f(aly) = (Qm)~2y?e“24", cE’. 
Therefore 


E[X|Y =] = / xf (xly) dz = 0 > E[X|Y] =0 > E[E(X|Y)] =0. 
J ox 


On the other hand, (1) and (2) imply that the marginal density of X 
1S 


fa) ee C(S(v | 1))(P(4v)(av)3)-"( x? /y)— 2te +) ) ge 


that 1s, X has a Student distribution with v degrees of freedom. In 
particular, for v = 1, X has a Cauchy distribution. In this case EX 
does not exist and hence E/ECX|Y)] # EX. 

The reason for this ‘contradiction’ is in the approach used 
above: we started from (2) , which yields ECX/Y) = 0, then from (1) 
and (2) derived (3) which is a density of ar.v. without expectation. 


6.10. Is it possible to extend one of the properties of the 
conditional expectation? 


Consider three r.v.s, say X, Y, Z. Suppose X is integrable and, 
moreover, X and Z are independent. Then ELX|Z] = EX a.s. Having 
this property we can assume that 


Y, Z|] =E[X|Y] as. 


(1) E| X 


Our purpose now is to show that in general such an ‘extension’ 
is impossible. To see this, take Q = [0, 1], F = Bro, 1] and P the 
Lebesgue measure. Define the following r.v.s: 


,_ fi, if we 0,4) 
Xw)= 45 if w € [4,1], 
Y(w) = fs if w € [3,1], 
Ae) = 4g if w ¢ (4, 3). 


Then we can check that_X and Z are independent. Furthermore, 


SO bpabco 
: 


| —— 0, if we} 
2 fT oc 3 ; a 
Li 


). otherwise, sacs ( 
Therefore ELX|Y, Z] # ELX|Y] and in general (1) does not hold. 


6.11. The mean-median-mode inequality may fail to hold 


Suppose_X is ar.v. with mean uw. A number m is called a median of 
X if P(X => m) = 4 and P(X < m) = +. It is easy to see that such m 
always exists, but in general _X may have several medians. If _X 1s 
unimodal and M is its mode, then the median m 1s unique and for 
M, m and uw we have either M< m< yu or M=>m => u-the median 
falls between the mean and the mode. A result of this kind 1s 
referred to as a mean-median-mode inequality. 

Recall that the symbol >, is used to denote a stochastic 


domination: for two r.v.s 6 and 4, ¢ >, y| = P[¢ > x] = P[y > x] for 


all x. 
Let us cite the following statement (Dharmadhikari and Joag- 
Dev 1988): if X is a unimodal r.v. with mode M, median m and 


mean pu and (X— m)* >.(X-m) ,thenM<m<u. 
Our goal now is to describe a case when the mean-median-mode 
inequality does not hold. Consider a r.v. X with density 


0, it 2 <0 
fiey=fn HO<eee 
ee 


geraeG if we 


Here c and A are positive constants and fis density iff 7/2 + c/A= 
1. We can easily find the mean, the median and the mode of X: 


C 
fanpy t=, ea aes 
At 


Now let c — 1. Then A > 2, — +3 > 1 and if c is sufficiently 
close to 1 but c > 1, then u > c and M=c > 1. Here the median m 
(= 1) does not fall between the mean wu (> 1) and the mode M (> 1), 
1.e. the mean-median-mode inequality does not hold despite the 
fact that the density fis unimodal. 


6.12. Not all properties of conditional expectations have 
analogues for conditional medians 


Recall that the conditional median of the r.v._X with respect to the 
o-field Y is defined as a P-measurable r.v., say m, such that 


PLX > m|D] > 5 < P[X < m|D). 


By using the notation uw(X|P) for the conditional median we want 
to see 1f the properties of conditional expectations can be extended 
to conditional medians. 

In the examples below X and Y are r.v.s, D is a o-field, F¢ is the 


trivial o-field (Fg = {0, Q}) and /(-) is the indicator function. 

(i) It is not always possible to find conditional medians satisfying 
W(X + Y|D) = w(X|D) + (YD). 

Indeed, let X1 and X be i.i.d. r.v.s with PLY, = 0] =4= 1 - PLY, = 

1] and put X¥ =X] Xo, Y= X1 — X1 Xo. Then uw(X|F 0) = 0, uF) = 


QO and even XY = 0 while w(X + Y\Fo) = u(X1|Fo) = 1. Thus the 


linear property of the conditional expectation (ELY + Y|P)] = ELY 
D}] + E[Y|P]) does not in general hold for conditional medians. 


(ii) It is not always possible to find conditional medians satisfying 


» soi ioe. 


u(u(X|D)|D1) = w(X|D) 


Consider the r.v.s X and Y where P[Y = k] = 4, k = 0, 1, 2; P[X= 


1|Y = k| — 
8 = 1-P[X =0|Y =k], k =0, 1, and P[X = 1|Y = 2] = 2 =1-P[X =0|Y = 2]. Let D 
be the o-field generated by Y, Since 
PA = 4 then u(X|Fo) = 1. However, u(X|!P) = W(X) = 
I(Y = 2) so p(u(X|P)|Fo) = 0. Therefore the smoothing property 
(E[E(X|P)|P1}] = ELX!P1]) also does not in general hold for 
conditional medians. 


(iii) If the r.v. X is independent of the o-field PD, it does not 
necessarily follow that every conditional median u(X|P) is 
constant. To see this we need the following result (Tonakins 
1975a): if X is independent of Y, then every median u(X|F 9) of X 


is a conditional median of X with respect to V. 
Now consider two independent r.v.s X and Y, each taking the 


values 1 and 0 with probability 4. Let D = D? be the o-field 
generated by Y. Then_X is independent of DY but the conditional 


median of X with respect to DY is equal to Y, that is it is not 
constant. 


SECTION 7. INDEPENDENCE OE RANDOM VARIABLES 


Two r.v.s X; and X> on a given probability space (2, F,P) are 
called independent if 


(1) P[X, € B,, Xp € By] = P[X, € Bi |P[X> € Ba] 


for any By, Bo € B! If F(x1, x9), (x1, x2) € RZ is thejoint d.f. of X] 


and X> and F'}(x1), x € r! and F (x2), x2 € r! are their respective 
marginal d.f.s then (1) is expressed as 


(2) F(2 , ©) — Fy (x | ) Fo (19) for all We], wo = R! ; 


ait 


In the absolutely continuous case the independence of X7; and X95 


can be written in terms of the corresponding densities by 
(3) f(x1,22) = fila1)fo(ze) forall 21,22 € R’. 


If Xq and X9 are discrete r.V.S with 
P[Xi = 214] = pigs Pag > 0,2 = 1,90, ue = 1 and 
P|X2 = 22j| = poj, pay > 0,9 = 1d); po3 = 1, then XY 
and X> are independent iff 


(4) P[X1 = @1;, X2 = £2j] = P[X1 = 21;|P[X2 = 29, 


or, equivalently, pj=p1ip2; for all possible i, j, where p 
js P(X] =x 1), X2=X2/I. 

We say that_X,..., X;, 1s a family of mutually independent r.v.s if 
for every k,2<k<nand 1 <ij <1) <... < ip <n the following 
relation holds: 


(5) PING EBs pamng ht BB SP SP cP SB) 


for arbitrary Borel sets B;, ..., Bj. If (5) is valid only for k = 2, the 
variables Xj,..., X;, are called pairwise independent. It is clear how 


mutual independence and pairwise independence of r.v.s can be 
expressed through the corresponding d.f.s (see (2)), and how to do 
this in the absolutely continuous case (see (3)) and in the discrete 
case (see (4)). 

Parallel to the notion of independence we can introduce the 
closely related notion of conditional independence. Let Y be a o- 
field, D < F¥ and D,, D> be classes of events. Then PY} and P> are 


said to be conditionally independent given P if, for all Dj € Py 
and D7 € Y>, the following relation holds: 


P(D, D2|D) = P(D,|D)P(D2|D)_ ass. 


Obviously this definition includes the conditional independence of 
random events and of random variables. 
Let_X and Y be r.v.s with 0 < VX < 0, 0 < VY < oo. The quantity 
E[(X — EX)(Y — EY) 


X,Y) = ————— 
pl : ) (VXVY)}/2 


is said to be a correlation coefficient between X and Y (simply, a 
correlation of X and Y). If p(X, Y) = 0, the variables X and Y are 
called uncorrelated. 

We refer the reader to the books by Feller (1968, 1971), Chung 
(1974), Chow and Teicher (1978), Laha and Rohatgi (1979), 
Shiryaev (1995) for a detailed treatment of the notion of 
independence and several related topics. 

The examples in this section examine the relationship between 
independence, dependence and related properties of r.v.s. 


7.1. Discrete random variables which are pairwise but 
not mutually independent 


Using some of the examples in Section 3 we can easily construct 
sets of r.v.s with different independence/dependence properties. 


(i) Let (X,Y,Z) take each of the values (1,0,0), (0,1,0), (0,0,1), 
(1,1,1) with probability 4. Then xX, Y and Z are pairwise 
independent. For example, it is easy to see that PLY = 1, Z = 0] = : 


= PLX = 1]P[Z = 0]. However, 


| 
ry 9 


= 


P[X =1,Y =1,Z2=1)=444 =P[X = 1)PlY = 1]P[Z = 1, 


and hence the three variables are not mutually independent. 


(ii) Let Q consist of nine points: the permutations of 1, 2, 3 and the 
triplets (1,1,1), (2,2,2), (3,3,3). Each has probability 4. Introduce 
three r.v.s, say X71, X7, X3, where X; equals the number appearing 


at the Ath place. The possible values of these variables are 1, 2, 3 
and we can easily show that 


OO) Pte s Pee eee = do 12 1 


It follows immediately from (1) that X], X>, X3 are pairwise 
independent. Since Xj] and X> uniquely determine X3, the three 
variables are not mutually independent. 


(iii) Let us continue the construction in case (11). Consider new 
triplets (X4,X5,X6),(X7,X8,X9),..., Similar in structure to (X1,X7,X3) 
and each independent of (X],X2,X3). Thus we obtain an infinite 
sequence of r.v.s X1,X9,... .Xy, .... Clearly, any two members X;, X7 


of this sequence satisfy relations (1). However the product rule 
does not hold for any k, k => 3, of these variables. Thus the r.v.s 
{Xp, 7 = 1} are only pairwise independent. 


7.2. Absolutely continuous random variables which are 
pairwise but not mutually independent 


Let ¢ and 7 be two independent r.v.s uniformly distributed in the 
interval (0,7). Define the variables X] = tan ¢, X) = tan yn, X3 = 
—tan(¢ + ny). The variables X] and X > are independent, as a 
consequence of the independence of ¢ and y. By finding the 
distribution of X3 we can establish that X3 and_X] are independent, 
as are X3 and X 7. However, these variables are functionally 
dependent by the relation X] + X> + X3 = X1 Xo X3 and thus they 
cannot be mutually independent. 

Thus we have constructed a triplet of r.v.s which are pairwise 
but not mutually independent (equivalently, independent at level 2 
and dependent at level 3). 


7.3. A set of dependent random variables such that any 


of its subsets consists of mutually independent 
variables 
If _X],..., X, are r.v.s, n > 3, and we know that they are mutually 


independent, then any proper subset of them consists of mutually 
independent variables. However, in general the converse statement 
is not true (see Examples 7.1 and 7.2, or construct analogues to 
some of the examples in Section 3). Here we shall consider two 
examples covering the discrete and the absolutely continuous 
cases. 


(i) Letn >3 and Ac rp”! be the set of all (n — 1)-dimensional 
vectors of the type a = (@],..., @,—-1]) where a; = 1 or 0,7 = 1,..., 
—1. Obviously A contains 20-1 elements (vectors): |A| = gn-l.. Let 
I(a) =a} +... + Gp—, So [(a) takes values 0,1,...,2 — 1. Let B C R” 
be the set of all vectors b where 


) — (Dive ng atid i Atay iseven 
7 (Q1, +++, Qn-1; 0), if I(a) is odd. 


Then 7: a+ 6b is a one-one mapping of 4A onto B, so |B| = gn-] 
and, moreover, B is permutation invariant. 

Let a) be an (n — 1)-dimensional vector obtained from b by 
eliminating the jth component of b. Denote by AY) the set of all 
such vectors a). Thus we have defined the mapping 7 1. Bo 


A(j). Clearly, A(") = A and since B is permutation invariant, we 
have A(j) = A(n) for all 7 = 1,..., 2 — 1 and hence A(/) = A for all 7 = 
l,.. .¥7. 

Now define the n-dimensional random vector X = (X}.,..., Xp) 
taking values in the set B and with a distribution given by 


9-(n-1) if rE 
(1) PIX = 2] = + , if cEesB 


0, otherwise. 


Let xX — fen | [x = (X4, sti eS eae X 541; ia a: Since 
L* are one-one mappings of B onto A, we find easily that the 
distribution of X(/) 1s given by 


Q-(n-1) if eG EC A 


(2) PLX') — xf )) _ { 


0), otherwise. 


The next step is to use relation (2) 1n order to find the marginal 
distribution of each of the components _X; of the vector XY. We have 


| * a 
. 5, if x; =Oorl 
3 PX. = 7;| — 2 2? Wt; 
8 | J 12 otherwise. 


Now, comparing (1), (2) and (3) we arrive at the following 
conclusion: we have constructed n dependent discrete r.v.s X.... 


,Xpn which are (n — 1)-wise independent, that is any proper subset of 
which consists of mutually independent variables being in this case 
even identically distributed. 

(ii) Let _XY be ar.v. with density function f and mean uw = EX. Let 
X1,..., Xp, n = 3, be r.v.s and take a function of the following type 
as theirjoint density: 


(4) gn {21,---, Zn) = | f(x) k = We — p) f(a; ], each z; € R'. 

i=] j=] 
We consider g, only for those x; € Ri, j = 1,..., 0, for which [xj — 
ulf(xj) < 1. Otherwise we put gy,(:) = 0. Then gy 1s a non-negative 
function. In order for (1) to be a density function, the integral of g, 
over the range of (X],..., X,) described above must be equal to 1. 
This leads to the condition 


(5) | | aC — p) f?(x)dz = 0. 


oO 


Notice that (5) is satisfied if, for example, the density f is 
symmetric about its mean value wu. 
Let the density f satisfy (5), g, be defined by (4), and_X.,..., X;, 


be r.v.s with density g,. Our purpose now is to establish what 


dependence there is between these 7 variables. 
By direct integration of (4) we find that each of the r.v.s_X}.... 


,X, has as its density the given function f. Suppose we have chosen 
k of the Xs, without restriction we can choose_X],..., Xf, 2 <k <n. 
Denote by Aj(x7,...5 X/)> (X1,---. x4) € RX the joint density of X}...., 
Xj. Then from (4) and (5) we can easily show that 


PEA Piso cee EN) SFA OT) ace nd BY 


Obviously this relation implies that X},.... Xx; are mutually 


independent. Of course, the same holds for any k-subset of 
X1,...,X;, where 2 < k <n. Nevertheless all 7 r.v.s_X],..., X, are not 


mutually independent because (4) implies that g,(x}...., x,) #AX1) 
...f(Xy). 

It is useful to consider the following case. Let X be distributed 
uniformly on the interval (0, c), 0 < c < o. Its density is f(x) = I/c 
for 0 < x < ¢ and 0 otherwise. Then = EX = 4c and (5) is 
satisfied. Take the random vector_X].,..., X, with density 


ae eed + [Leese |, fO< a <at— lex, n 
ef eee on ; ae 
(), otherwise. 


Clearly g, is not the uniform density on the n-dimensional cube (0, 


c)" in R" and_X},,...,_X,, cannot be mutually independent. However, 
any k of them, 2 < k < n, will be distributed uniformly in the cube 
(0, c)k in R* and these k variables are mutually independent. 


Hence we have described collections of n dependent absolutely 
continuous r.v.s which are (n — 1)-wise independent. 


(iii) Consider the following function 


f(@ : aaa ane v ) a (27) "(I — COS 2]... COS tel if (Pi. sated Pe) e On 
: 0, otherwise 


where Q,, is the n-dimensional cube [0, 2n]” in R”. It is easy to 


check that fis non-negative and the integral of fover R” equals 1. 
Hence fis a probability density function of a random vector in R”, 
say of (X}.,..., Xy). 

Denoting by f;(x;) the marginal density of the component Xx, 
we find that 


ee | Mem, COS. Ss 27 
Fe(@n) = i otherwise 


implying that _X; is uniformly distributed on the interval [0, 27] and 
this holds for any (single) r.v. X7,X9,..., X,. The form of theirjoint 
density f shows that these variables are not independent. If, 
however, we take any é of them, we conclude that for 2 <k <n —- |] 
they are independent (theirjoint density is equal to 1/(2n)* on the 
cube O; = [0, 2n]* in Rr‘). 

Therefore X],...,X, 18 another collection of n dependent r.v.s 


which are (n — 1)-wise independent. (Compare with case (11) 
above.) 


7.4. Collection of m dependent random variables which 
are k-wise independent 


In Example 7.3 we have described collections of n dependent r.v.s 
which are (n — 1)-wise independent. Thus it is of a general interest 
to see collections of nm dependent r.v.s which are k-wise 
independent with k <n — 1. 

We present two examples: in the first we have n = 4, k = 2, 


while in the second n = 5, k =3. 


(i) Let Fy, Fo, F3, F4 be d.f.s on R! (or on its subsets). Denote Gj 
= | — F; and define the function H1934(%1, x2, x3, X4), (*1, x2, X3, 


x4) € R* as follows (for simplicity we omit the arguments but we 
know they are real): 
Ayas4 — im Fo hry Py{l + EyGoGraGy + eserien a Eats, GoGy4 so 24G,G2G3}. 


Our first claim 1s that if ¢1, ¢9, €3, €4, are non-zero numbers in 
the interval (—1,1) and |e] + jeg] + Je3| + eq] < 1, then 41934 18 a 
four-dimensional d.f. Let (€7, ¢o, ¢3, ¢4) be a random vector whose 
df. is just Hyj234. We are interested in the 


independence/dependence properties of the components of this 
vector, so we need to know its -dimensional marginal 
distributions for £4 = 3, 2 and 1. For example, if 4173, H,2 and Ay 


are the d.f.s of (Cy, ¢9, 63), (€], 62) and ¢] respectively, we easily 
find that 


H 93 = Fy Fy F3{1 + €4G,GoG3}, fHy9 = fy 5 and Hy = Fy. 


It is quite clear how to write down the d.f. of any possible subset of 
components of thevector(¢], ¢9, ¢3, C4). 
Thus we arrive at the following conclusions: 
(a) cj has ad.f. equal to Fi, 7 = 1, 2, 3, 4; 
(b) any two of the r.v.s ¢1, ¢9, 63, ¢4 are independent; 


(c) any three of them as well as all four are dependent. 
Therefore {&], ¢2, 63, ¢4} 1s a collection of dependent r.v.s 


which are twice-wise (= pairwise) independent. 


(ii) Suppose that we have five d.f.s F'}, F, #3, F'4, F’'5 and as above 
we use the notation Gj =]|- Fi, j =1,..., 5. Define the function 


H17345(X1, x2, X35 X4, X5), (X], X25 X35 X4, x5) € R° as follows: 


Ayoza5 = Fy Fo F3 Fy Fs {1 + €1GeG3G4G5 + €2G1G3G4Gs 
+ €3GGoG4Gs + €4G 1 GeG3G5 + €5G1G2G3G4}. 


If €], €9, €3, €4, €5 are non-zero numbers in the interval (—1,1) 
and |€j| + |€2| + je3| + jea] + le5| < 1, then 12345 1s a five- 
dimensional d.f. of a random vector in R°, say (41, 42, 13, N14, 115). 
In order to clarify what kind of independence/dependence there 
exists between the components of this vector, we have first to find 


all k-dimensional marginal distributions for k = 4, 3, 2, 1. In 
particular, 1f H1234, 123, H12 and Hy are the d.f.s of (71, 2, 73, 


N4), (11,2, 3), (41,2) and 71 respectively, we find that 


Hie3g =P FoF3Fy{1t+e5G,G2G3G4}, Ayag3=F folks, Hye=F\ fo, Wy =F}. 


Similarly we can write the d.f.s in all the remaining cases, thus 
arriving at the following conclusions: 

(a) 7; has a d.f. equal to Fj, j = 1, 2, 3, 4, 5; 

(b) any two of the r.v.s 71, 42, 43, 44, 45 are independent; 


(c) any three of them are independent; 
(d) any four, as well as all five, variables are dependent. 
Hence {7], 72, 43, 14, 45} 18 a collection of dependent r.v.s 


which are three-wise independent. 

Note finally that a similar idea can be used when describing n 
dependent r.v.s which are m-wise independent. In cases (1) and (11) 
above, as well as in the general case, the description can be done in 
terms of probability density functions. 


7.5. An independence-type property for random 
variables 


Let X71, X9,... be positive integer-valued r.v.s and Sz; = X] +... + 
X,. Suppose that Yj], Y>... is another sequence of 1.1.d. positive 


F 


integer-valued r.v.s with P[Y, = i] = pj, p; > 0, 0°, pi = 1, 


and for all k= 1 andi = 1 the following relation holds: 
(1) P(S, = i] = P[Y. +--+ + Ye = 7]. 


For various purposes one needs to find P[S] = 71, So =i) ..., Sr = 
ij]. Taking into account (1), the equalities So = 7) + X9, 83 =i9 + 
X32, ..., Sr = ip—-1 + Xj; and the independence of Ys, we can suppose 
that 


(2) P|S, = 21,52 =22,..., 5, = ir| = Pi, Pia-i, «++ Pipn—ip_1- 


Obviously (2) 1s satisfied if the variables X], X,... are 


independent. Thus we want to know whether or not relation (2) 
holds for any choice of the sequence {X;}. 

Let pj, p2, p3 be positive numbers with py + po + p3 = 1. 
Denote by Y ar.v. taking the values 1, 2, 3 with probabilities pl, 
p2, p3 respectively, and let {Y¥%, k = 1} be a sequence of 


independent copies of Y. 
Now define the pair of r.v.s (X1, X2) as follows: 


with €11 = €92 = €33 = 0, €9] = 632 = €13 =e and €]72 = €93 =s31 = 
—s where the real number ¢ is chosen so that |s| < min{p1p9, p2p3, 
P1p3}. Let(X3, X4), (X5, X6),... be independent copies of the pair 
(X1, X2). Thus we obtain the sequence 1, X9, ..., Xy,.... 

We want to determine whether the sequences {X;} and {Y;} just 
defined satisfy conditions (1) and (2). Evidently, for all i, 7 we have 


PX, =it]=p; and PiX,+ X2.=7/=PIN+Y=])| 
and (1) holds. Furthermore, if ¢ # 0 then 


P[S1 = 2, So = 3] = P[X, = 2, Xo = 1] = popi te F popr 


and hence (2) is not satisfied. Therefore the independence property 
for the sequence {X;} 1s essential for the vahdity of (2). 


7.6. Dependent random variables X and Y such that x 

and Y2 are independent 
It is well known that if X and Y are independent r.v.s, then for any 
continuous functions g and h, the r.v.s g(X) and A(Y) are also 


independent (see Gnedenko 1962; Feller 1971). The converse 
statement is true if the functions g and / are one-one mappings of 


Rr! to ri, However, we can choose functions g and / without this 
condition such that g(X) and h(Y) are independent r.v.s but_X and Y 
themselves are not. We present two examples treating the discrete 
and the absolutely continuous cases. 


(i) Consider the two-dimensional random vector (X, Y) with 
pig = PIX =4,Y =j], i,j =-1,0,1 


where 


1 ‘ _ —— _ 
35; P-1,-1 = Pi,-1 = Pi,0 = 


ul) jis 
i= Po--1t spy POO — 5. 


T T — a Ge 
Pla SH FP=14h = Pol = 39> 


P-1 


Ino 


It is easy to check that X2 and Y” are independent r.v.s but_X and 
Y are not. 
(ii) Let_X1 and X be two independent absolutely continuous r.v.s. 
Take another r.v. Y which is independent on X1, X> and assumes 
the values +1 and —1 with probability £ each. Define two new r.v.s, 
say Z; and Z9, by / 


fy, =Y¥X1, 49=¥ Xo. 


The absolute continuity of X;] and X> implies that Z, and Z> are 
absolutely continuous. Obviously, Z; and Z> are functionally 


connected and thus they cannotbe independent. However, 
Z? = X?, 72 = X3 and, since X] and X> are independent, 7? and 
Zs are independent. 

(iii) Here is another illustration. Let the random vector (X, Y) have 


the following density (compare with Example 5.8(11)); 


1(14 ay), if |x| < land |y| <1 
n = 4 3 
f(x,y) 0, otherwise. 


We easily find the marginal densities f}(x) of X and fo(y) of Y: 


fi (x) = se if | = foly) = : > if ly! = I 


Q. otherwise, Q, otherwise. 


Obviously f(x, v) # fy(x)fy) for all x and y, hence X and Y are 
dependent. 

Each of the variables X% and Y2 takes values in (0,1) and for x € 
(0,1) and y € (0,1) we find 


P[X* <2,Y? <y] = Pl-V2 < X < Va,-VWy < Y < Vy 


1 ¢v@ pvy 
= / / (1 + uv) dudu 
Bde cage 


VaJSy =P[X* <a]P[Y’ <y], 2,y € (0,1). 


: | 


Thus X2 and Y? are independent r.v.s. 


7.7. The independence of random variables in terms of 
characteristic functions 


If X is ar.v. defined on a given probability space (©, F, P), then the 
function ¢(t) = Efe**],t © R’,i = V-1 is called a 


characteristicfunction (ch.f.) of X. An extensive treatment of ch.f.s 
is given in Section 8. Here we illustrate the independence property 
of r.v.s in terms of the corresponding ch.f:s. 

Let X1, X> be independent r.v.s and 91, #9 their characteristic 


functions (ch.f.s) respectively. Then the ch.f. 9 of the sum _X] + Xo 
IS 9109: 


(1) o(t) = d1(t)do(t) forallt € R’. 


We can pose the converse question: if #1, #9 and @ are the ch.f.s of 
X1, X2, and X71 + X> and (1) holds, does it follow that X71 and X> 
are independent? Let us show that the answer to this question 1s 
negative. 


(i) Let the random vector (X7,_X2) have density 


=, 
bn 
al 


ra fics a +e 
a -(l+ay%2(2j—25)|, if |a,| < land |r| < 1 
F(@1, 72) ' 0, otherwise. 
First, we find from (2) the marginal densities f; and fo of X] and 
X5, namely 


wi i i _- 
ee r1| <1 ot een 5; if Z2| aa 
viipe pes 2 7, — a({ 7: — £ 3? a 
fi ( 1) { 0, otherwise, J2 ( 2) QO, otherwise. 


Since f(x], x2) # fq(x1)fo(x2), the r.v.s Xy and X> are not 
independent. 

Second, the variables X] and_X> are identically distributed and 
for their ch.f.s @] and 97 we can easily show that 


b(t) = do(t) =t *sint, te R‘. 


Third, denote by g the density of the sum X] + X 5. Then g 1s 


expressed by f from (2) as g(x) = f°. f(a1,a—2) da, and a 


— me 


direct integration yields 
g(2 + r), if -—2<2x<0 
g(z)= 4 (2-2), if0O<r<2 
0, if |a| > 2. 
Having g, we find that the ch.f. @ of X] +X 1s 
b(t) = ¢t~* sin’ t. 


Therefore (ft) = 91(t)02(t), that 1s relation (1) is satisfied, but, as 
we saw above, the variables X] and _X> are dependent. 


(ii) Take X] = X27 = X where X has a Cauchy distribution with 


density 1/[z(1 + x2)], xer!, If 0], 2 and @ are the ch.f.s of X17, X9 
and X1 +X respectively, we have 91(t) = 99(t) = ell, o(t) = eZ ltl 
Hence a(t) = 91 (t)@2(t) for all t € r!, but clearly X1 and_X are not 


independent. 
Finally, let us recall that the r.v.s_X], ..., X, with ch.f.s 01, ..., 0 


are independent iff for all real f1, ..., ¢, 
Elexp(i (ty A a tnXn))| == O71 (t1) ee On(tn). 


Comparing (1) with this general condition enables us to explain the 
conclusions obtained in the examples above. 


7.8. The independence of random variables in terms of 
generating functions 
If X is a non-negative integer-valued r.v., then the function p(z) = 


E[z*] is called a probability generatingfunction (p.g.f.). Recall that 
p(z) 1s defined for all complex numbers z with |z| < 1. Further, if X 


is an arbitrary r.v., then the function M(z) = E[e* AT Z complex, 1s 


called a moment generatingfunction (m.g.f.) of X. More on p.g.f.s 
and m.g.f.s is included in Section 8. Here we are interested in 
expressing the independence property of r.v.s by the corresponding 
generating functions. 


(i) Let _X and Y be independent non-negative integer-valued r.v.s. 
Denote by py, py, and Py+y the probability generating functions 


of X, Yand_X+ Y respectively. Then 
(1) px+y(z) = px(z)py (2). 


It is natural to ask the following question: if XY and Y are non- 
negative, integervalued r.v.s such that (1) 1s satisfied, does it 
follow that_X and Y are independent? We show by an example that 
in general the answer 1s negative. 

Let ¢ and 7 be independent r.v.s such that ¢ takes the values 0, 1 
and 2 with probability 4 each, and 7 takes the values 0 and | with 
probabilities = and = respectively. Define ¥ = € and Y= € + n (mod 
3). Then Y takes the values 0, 1 and 2 with probability 4 each. 
Further, the sum X + Y takes the values 0, 1, 2, 3 and 4 with 
probabilities ¢, =, =, = and 4 respectively. Obviously relation (1) is 
satisfied for the p.g.f.s of X, Y and_X + Y. However, the variables X 
and Y are not independent; they are functionally dependent. 

In addition, we can show that_X and Y are uncorrelated (for this 
property see Examples 7.9 and 7.10 below). 


(ii) If X and Y are arbitrary r.v.s which are independent and My, 
My and My + yare the m.g.f.s of X, Yand X + Y respectively, then 


As in case (1) we want to know if (2) implies the independence of 
X and Y. The answer will follow from the example below. 

Let (X, Y) be a two-dimensional random vector defined by the 
table: 


] 2 1. 3 

18 18 18 
zt, a2, <i Bie 
= 18 18 18 
2 a ee 
* 18 18 18 


We can easily find that _X and Y are identically distributed r.v.s 
taking each of the values 1, 2, 3 with probability 4, The sum Z =X 
+ Y is a ae taking the values 2, 3, 4, 5, 6 with probabilities 
G> 5» 5+ 4» g Fespectively. Since X, Y and X + Y are non-negative 
and integer-valued, we can study their properties in terms of the 
p.g.f.s. But in all cases we can use m.g.f.s. Thus for the m.g.f.s we 


get 


My (z) = Ele** | — My (z = Ele 2Y) = — x(e? a e7 a e°*). 
Mz(z) = Mx4y(z) = 3(e?? + 203% + 3e4? + 2e5* + e8), 


Clearly My+y(z) = My(z)My(z), 1.e. relation (2) 1is_ satisfied. 
However, the r.v.s X and Y are not independent as can be seen 
easily from the table above: PLY = i, Y = j] # PLX = iJP[Y = /] for 
all i 47. 

Finally, let us comment on both cases (1) and (11). The 
independence of two (or more) r.v.s can be expressed in terms of 
the p.g.f.s or the m.g.f.s. Let us illustrate this for two variables. 

If (X71, X2) 1s a random vector whose components X] and_X> are 


non-negative integer-valued r.v.s, then its p.g.f., say p(Z1, 22), 18 
defined as 


(21,22) = E[z*? 23], complex 21, 22, |z1| < 1,|ze| < 1. 


Denote by p (z1) the p.g.f. of X] and p2(z2) the p.g.f. of X>. Then 


X, and X> are independent iff p(z1, 22) = pj(Z1)p2(z2) for all z} and 
Z9. For z] = Z9 =z, the function p(z, z) = E[z* P AQ] is the p.g.f. of 
the sum X] +X in which case p(z, Z) = p}(Z)p2(z). This is exactly 
case (1) above where we do not have p(z1, 22) = p(z1)p(Z2) for all 
Z|], Z2, 1.e. we do not have independent _X] and_X>. For an arbitrary 
random vector (X], X2) the m.g.f. 1s defined by 


M (21, 22) = Elexp(21.X1 + 22X2)|, 21, 22 complex. 


Denote by Mj(z1) and Mo(z2) the m.g.fis of X] and X9 
respectively, and |zj| < 7 and |z9| <r, given r => 0. Then X] and X9 
are independent iff M(z1, 29) = My (z1)Mo(z2) for all z1, zo. If we 
take z] =Z, 29 = z we get the function M(z, z) which 1s the m.g.f. of 
the sum_X] +X. Obviously in this case M(z, z) = My (z)Mo(z). We 
met this equality in case (11) above. However, 1n this case M(z], Z2) 
= My(z1)M9(z2) does not hold for all zy and z7. This explains why 
x; and_X> are not independent. 


7.9. The distribution of a sum can be expressed by the 
convolution even if the variables are dependent 

If. X1 and_X are r.v.s with d.f.s /'y and F’> respectively, and X17, X9 

are independent, the distribution of the sum_X] +X) 1s Fy * F4. If 

X, and xX are absolutely continuous with densities fy; and fo 

respectively, then the density of X] + X9 1s f] * fo. 


Now we are interested in the converse: what is the connection 
between the r.v.s X7 and X5 if we know that the sum X]7 + X has 


distribution /'; * F> or density f} * fo? The answer will follow 
from an example based on the Cauchy distribution. 
Let f(x) = al[7(a~ + x*)], x € Rk! be the density of a Cauchy 


distribution, where a > 0. It 1s easy to check, for example by using 
ch.f.s, that the family of Cauchy densities is closed under 
convolutions. Consider two independent r.v.s ¢ and 7 each with 
density fy. Let X = ac + fy, Y = y¢ + on where a, f, y, 0 are 
arbitrary real numbers. Then the sum X + Y has density 
fatBp+yto)o which is the convolution of the densities Ka+B)a of X 
and f(y+o)a of Y. Nevertheless, X and Y are not independent. 


7.10. Discrete random variables which are uncorrelated 
but not independent 


It is a well known result that if XY and Y are integrable and 
independent r.v.s, they are uncorrelated. The property of 
uncorrelatedness is weaker than independence. This will be 
demonstrated by a few examples. Here we consider discrete r.v.s; 
the absolutely continuous case is treated in Example 7.11. 


i) Let_X and Y be r.v.s such that p; ; = P|X =i, Y=/] are given b 
Pij J, 8 y 
Pil = P-1,1 = Pl,-1 = P-1,-1 = a€, Po, = Po,-1 = Pio = P-1,0 = 5(1 é) 


where 0 < € < 1. It is easy to find the marginal distributions of X, Y 
andcompute that EX = 0, EY = 0. Moreover, we also find that 
E[XY] = 0 and hence the variables XY and Y are uncorrelated. 
However, 


P[X = 0,¥ = 0] = 0 4 P[X = OP[Y = 0] = 4(1-<)? 


and thus_X and Y are not independent. 


(ii) Let Q = {1, 2, 3} and let each w € Q have probability 4. Define 
two r.v.s_X and Y by 


lL ifw=1 0, ifw=1 
x)=] 0 if w =2 yw)=41 if w=2 


=1, fwd, UO, ft w=.d. 


Then EX = 0, ELXYY] = 0, so_X and Y are uncorrelated. But 
Px LY a1) Sle ove SPX SPY = 


and therefore_X and Y are not independent. 
(iii) Let_X and Y be r.v.s each taking the values —1,0,1. The joint 
probability pj; = PLX =i, Y=/] 1s given by 


z er — = oe 
P1,0 = P—1,0 = Po,1 = Po,-1 = @- 


Then obviously EX = 0, EY = 0 and ELXY] = 0. Thus_X and Y are 
uncorrelated. Further, 


Bo 
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and clearly the relation PLY = i, Y = /] = PLY = i|P[Y = /] 1s not 
valid for all pairs (7,/). So the variables_X and Y are dependent. 

(iv) Let € be ar.v. taking the values 0, 47 and z with probability 
+ each. 

Then it is easy to see that XY = sin ¢€ and Y = cos € are 
uncorrelated. However, they are not independent. Moreover, X and 


Y are functionally connected: X2 + Y2=1. 


7.11. Absolutely continuous random variables which are 
uncorrelated but not independent 


(i) Let_X] and_X9 have a joint uniform distribution on the unit disk, 
i.e. the probability density function f is f(x], x2) = x! if 
x? +032 <1 and f(x}, x2) = 0 otherwise. Simple computation 
shows that ELX,; Xo] = 0. Thus the variables X; and X> are 
uncorrelated. It is very easy to find the marginal densities fj and fo 


and see that f(x1, x2) #1 (1 )fo(x2). This means that X7 and X5 are 


not independent. 
(ii) If ¢ is uniformly distributed on the interval (0,27) then X] = 


sing and X> = cos¢ satisfy the relations: 
EX, = 0), EX» = 0, E|X,X2| —= (). 


Therefore X1 and X5 are uncorrelated but not independent. They 
are functionally dependent: X?+ X= 1. (See case (iv) of 
Example 7.10.) 
(iii) Recall that the r.v._X 1s said to be normally distributed with 
parameters a and 3%, where a € Ri, o* > QO, if X 1s absolutely 
continuous and has a density 
(/ ana)! exp[—$(a — ae | x € Rl’. In such a case we use 
the following standard notation: XY ~ N(a, 0). Several properties of 
the normal distribution will be discussed further in Section 10. 
Let X ~ N(0,1) and Xp = X7 — 1. Then EX) = 0, ELX] X9] = 0. 
Hence X7 and X are uncorrelated. However they are functionally 
dependent. 


7.12. Independent random variables have zero 
correlation ratio, but the converse is not true 


Let X and Y be r.v.s such that EY and VY exist. As usual, E[ YX] 
denotes the conditional expectation of Y given_X. The quantity 


Kx(Y) = VIE(Y 


AVY 


is called a correlation ratio of Y with respect to_X. Obviously 0 < 
Ky (Y) < 1 and Ky (Y) 1s defined for Y with VY > 0 (see Rényi 


1970). Note that Ky (Y) gives us information about the mutual 
dependence of X and Y. 


Obviously, if X and Y are independent and 0 < VY < oo then Ky 
(Y) = 0, but not conversely. To see this, take (X, Y ) to be uniformly 


distributed on the unit disk x2 + y < 1. Let g(y\x) be the 
conditional density of Y given_X =x. We have 


g(y|z) = 1/[2(1 —2?)2] for |y|<(1—27)? and |2| <1. 


Hence E[Y|X] = 0 and consequently Ky (Y) = 0, though the 
variables X and Y are not independent. 


7.13. The relation E|Y|X] = EY almost surely does not 
imply that the random variables X and Y are 
independent 


If_X and Y are independent r.v.s on a given probability space and Y 
is integrable, then E[ YX] = EY a.s. Now the question 1s whether or 
not the converse is true. 

Let Z be any integrable r.v. which 1s distributed symmetrically 
with respect to zero, and let_X be a r.v. independent of Z and such 
that X > 1 a.s. Let Y= Z/X. Then Y 1s integrable and the conditional 
expectation E| Y|X] 1s well defined. Obviously we have 


EZ=0, EY=0, E[Y|X]=0. 


Therefore the relation E[ YX] = EY a:s. is satisfied but the variables 
X and Y are dependent. 


7.14. There is no relationship between the notions of 
independence and conditional independence 


Intuitively the notions of independence and _ conditional 
independence are close to each other (see the introductory notes to 
this section). By a few examples we can show that neither of them 
implies the other one. (Also see Example 3.6.) 


(i) Let _X,, n = 1 be independent Bernoulli r.v.s, that 1s X} are 1.1.d. 


and each takes two values, | and 0, with probabilities p and 1 — p 
respectively. As usual, let S, =X] +... +X). Then obviously for S9 


= 0 or 2, we have PLX] = 1|S2| > 0 and PLX> = 1|S9] > 0, whereas 
for Sp = 0, PLXy = 1, Xo = 1|S9] = 0; that is, the equality 


P(X, = 1, Xo = 1]S9] = P[X1 = 1|S]P[X> = 1]S)) 


is not satisfied. Therefore the independence property of r.v.s can be 
lost under conditioning. 
(ii) Let _X,, n => 1 be independent integer-valued r.v.s and S, =X] + 
... t Xp. Then clearly the r.v.s Sn, n => 1 are dependent. However, 
given that the event [S> = k] has a positive probability and occurs, 
we can easily show that 


zi - P le — i. So = k. S: = 7 
P|S; = i, Sg = j|So| — PIS) = 1, 52 = k, 53 = 3] 


P[S2 =k 
7 P[S. = k] 
| _PLX3 = 7 — kIP[S> =k 
= P[S, = jigqee as EE ba al 


P[So = k] 

= P[S; = 2|S2]P[S3 = 7|S9]. 
Therefore there are dependent r.v.s which are conditionally 
independent. 
(iii) Consider three r.v.s, X, Y and Z, with the followingyjoint 
distribution: 


P22 ak Sg aa = wa 


where O< p<l,g=l-p,k=1.,...,.m—-1,m=2.,..,n—-1,n=3, 
4... 


Firstly, we can easily find the distributions of the pairs (CX, Y), 
CX, Z) and (Y, Z), then the marginal distribution of each of X, Y and 
Z and see in particular that the r.v.s Z and_X are dependent. 

Further, we have 


Here. 4 J m—l., ma ?2.3..... 


PIX =’, ¥ =m) =o’ 7" *, k: 
i re m—l, m= 2,..., n— 1. 


P2442 =k9 =n) ao, 


Hence for k= 1,..., m — | and m = 2, 3,... we can obtain that 


7 | 
E|Z|A4 =k, =m\ = S- npg? ™—} —m+-— 
p 


n=m+r | 
and write the relation 
| 1 
E|Z|\X,Y|=Y+- as. 
P 


Moreover, for any measurable and bounded function g, 


xO 
Sk =A |= Sg(j +m) )\pq?* 


j=l 


E|g(7) 


so that 
ox 
E[g(Z)|X,Y] =) g(¥ + j)pq?* as. 
j=l 


Obviously the right-hand side of the last equality does not 
depend on_X, which means that Z is conditionally independent of X 
given Y despite the fact (mentioned above) that Z and X are 
dependent r.v.s. 


7.15. Mutual independence implies the exchangeability 
of any set of random variables, but not conversely 


Let X},..., X, be 11.d. r.v.s. Clearly for any permutation (/1,..., i,) 
of (1,..., 2), the random vectors (X},..., X;,) and (47, ,..., X7,) have the 
same distribution. Thus X},.... X, 1s a set of exchangeable 
variables. However, the converse is not generally true and this 1s 
illustrated by the following examples. 

(i) Let 6 be an arbitrary r.v. with values in the interval (0, 1). Let 
Yj, Y2, ... be r.v.s which, conditionally on 6, are independent and 


take the values 1 and 0 with probabilities 0 and | — 6 respectively. 
Then for any sequence u,..., u, of Os and 1s, we have 


(1 PLY) = 4, Yo = ta,..., Y, = u,|@)] = 6*(1 — 6)" * 
| ( 


where A = uj + ... + uy, and n is an arbitrary natural number. We 
are interested in the properties of the set of r.v.s Y],..., Y,. Taking 
the expectation of both sides of (1) we find that the probability 


PIM = 01,000) ¥a = ta] = EPO) = H.-Y, = tal®)] = EL — 8) *|] 


depends only on the sum uw] + ... + u,, which is k, and on n of 
course. Therefore Y},..., Yy,, for any n, 1s a set of exchangeable 
variables. However, Yj,..., Y, are not mutually independent. 
Indeed, P[Y; = 1] = E@ for each; > 1. Further, (1) implies that P[Y; 
= 1,.... Y, = 1] = E[6”]. On the other hand P[(}'"_, (Y; ~ y;)] But 
0 is an arbitrary r.v. with values in (0, 1). If, for example, @ 1s 
uniformly distributed on (0, 1), then (E0)” = (4)" # Im + 1) = 
E({@"]. This justifies our statement that Yy,..., Y,, are not mutually 
independent. 

(ii) Suppose that an um containing balls of two colours, say w 


white and b black, is used, and after each draw the chosen ball is 
returned, together with s balls of the same colour. Introduce the 


r.v.S Y1,..., Y,, such that 


y= 1, if the ith draw is black 
‘| 0, ifthe ith draw is white. 


It can be shown that the variables Y1,..., Y, are not independent 
but they are exchangeable. The last statement follows from the fact 
that P[()"_,(Y; = y;)| depends only on the sum 577", y; (for 
details we refer the reader to Johnson and Kotz (1977)). 


7.16. Different kinds of monotone dependence between 
random variables 


Recall that the r.v. Y is said to be completely dependent on the tr.v. 
X if there exists a function g such that 


PIY = 9(X)]=1. 


Another measure of dependence between two non-degenerate 
r.v.s_X and Y is that of sup correlation, defined by 


p(X, Y) = supp(f(X), 9(Y)) 


where the supremum 1s taken over all measurable f and g with 0 < 
VIAX)] < «©, 0 < V[g(Y)] < © and p is the ordinary correlation 
coefficient. 

Let X and Y be absolutely continuous r.v.s. They are called 
monotone dependent if there exists a monotone function g for 
which P[Y = g(X)] = 1. 

The quantity 


p(X, Y) = supp(f(X), 9(Y)) 


where the supremum is taken over all monotone functions f and g 
such that 0 < V[f(X)] < o and 0 < V[g(Y)] < ©, is said to be a 


monotone correlation. 

Let us try to compare these kinds of monotone dependence. It is 
clear that if X and Y are monotone dependent, then their monotone 
correlation is 1. However, the converse statement is false. Indeed, 
let (X, Y) have a uniform distribution over the region [(0,1) x (0,1)] 
U [d, 2) x (1, 2)]. Then 


p(X, Y) > pLo,1)(X), Lo, (Y)) = 1 


but_X and Y are not monotone dependent. 
Further, it is obvious that 


(1) lo(X, Y) 


Spx, | SX. }. 


For a bivariate normally distributed CX, Y), 1t 1s well known that | 
P(X, Y)| = p(X, Y), and in this case we should have equalities in (1). 
On the other hand, it can easily be seen that in general p* 1s not 
equal to p*. Indeed, take (X, Y) with a uniform distribution on the 
region 

(0, 1) x (0, 1)] U [(0, 1) x (2,3)] U [(1, 2) x (1, 2)] U [(2, 3) x (2,3)]. 

Let f(x) = 1(0,1)@) + 1(2,3)(). Then p*(X, Y) < 1, but 


p(X, Y) > p(f(X), £(Y)) = 1. 


SECTION 8. CHARACTERISTIC AND GENERATING 
FUNCTIONS 


Let X be a r.v. defined on the probability space ((Q,4,P). The 
function 


(1) ¢(t) =Efe’*], teR', i=V-1 


is said to be a characteristic function (ch.f.) of X. If F(x), x € Ri, 


is the df of X then ¢(t) = f~ e“*dF(zx).. Thus 
g(t) = f- e* f(x) da if X is absolutely continuous with density 


f and g(t) = So e%™p, if X is discrete with 
PLX = xp] = Pn, Pn > 0, D7, Pn = 1. Recall some of the basic 
propcrtics of a ch.f. 


(i) @(0) = 1, o(-t) = A(t), |d(t)| < 1,t ER’. 
(ii) If ELX”] exists, then o™)(0) exists and ELXn] = i Ng()(0), 
(ii) If a0) exists and n is even, then ELX"] exists; if n is odd, 
then ELX" |] exists. 
(iv) If ELX”] exists (and hence ELX*] exists for k <n) then 


in the neighbourhood of the origin. 
(v) o(t),teR! is ach. iff a(0) = 1 and a is positive definite. 
(vi) If X7 and_X are r.v.s with ch.f.s @ and @2, and Xj and_X> are 
independent, then the ch.f. @ of the sum_X] +X 1s given by 


b(t) = di(t)da(t), t ER’. 


(vil) If we know the ch.f. 9 of a r.v._X then we can find the d.f. F 
of X by the so-called inversion formula and, moreover, if @ 1s 


absolutely integrable over r! then X is absolutely continuous 
and its density is the inverse Fourier transform of g. 
Let us introduce two other functions which, like the ch.f. 0, are 
essentially used in probability theory. For an arbitrary r.v..X with a 
d.f. F’ denote 


(2) M/(z) = Efe**] = | e* dF (x), zacomplex number. 


Suppose for some real r > 0 the function M(z) is well defined for 
all z, |z| < r. In such a case M 1s called a moment generating 
function (m.g.f.) of X and also of fF. The relationship between the 
m.g.f. M and the ch.f. 9, see (1), is obvious: M(it) = a(f) for real t. 

If X 1s a non-negative integer valued r.v. we can introduce the 
function 


(3) p(z) =E[z*], z complex 


which is called a probability generating function (p.g.f.) of X. 
(Note that the m.g.f. and p.g.f. were briefly introduced in Example 
7.8 and used to analyse the independence property. ) 

Some of the properties of @ listed above can be reformulated for 
the generating functions M and p. However, note that the ch.f. of a 
distribution always exists while the m.g.f. need not always exist 
(excluding the trivial case when ¢t = 0). 

The ch.f. @ is called analytic if there is a number 7 > 0 such that 
@ can be represented by a convergent power series in the interval 
(-r, r), that is if d(t) = So axt*/k!, t € (—r,r), with some 
complex coefficients aj. The following important result is often 
used (see Lukacs 1970; Chow and Teicher 1978). If F' and @ are a 
pair of a d.f. and a ch.f., then the following conditions are 
equivalent: (a) o@ 1s r-analytic; (b) the moments 
m, = [a*dF(xr),k > 1 are finite and o admits the 
representation 
b(t) = P45 mp (it)* /kI, t € (—r,r); () fe"! dF (x) < 00, 

Q <t <r. Usually (c) 1s called Cramer condition. 

Clearly, the m.g.f. M does exist iff the corresponding ch.f. @ 1s 
analytic. 

We say that the ch.f. a 1s decomposable (or factorizable) if 


o(t) = di(f)de(t), tEeR 


where 91 and 99 are both ch.f.s of non-degenerate distributions. If 
@ admits only a trivial product representation (that is, 1f a, or 99 1s 
of the form e’@*, a=constant), it is called indecomposable. 

We refer the reader to the books by Lukacs (1970), 
Ramachandran (1967), Feller (1971), Chow and Teicher (1978), 
Rao (1984), Shiryaev (1995) and Bauer (1996) where the theory of 
characteristic functions and related topics can be found in detail. 

In this section we have included various counterexamples which 
explain the meaning of some of the properties of ch.f.s and of 
generating functions. 


8.1. Different characteristic functions which coincide on 
a finite interval but not on the whole real line 


Suppose 91, #9 are ch.f.s such that @1(f) = 99(t) for t € [-/, /] where 
/ is an arbitrary positive number. Does it then follow that 91(¢) 


coincides with g(t) for all ¢ € r!? This important problem was 


considered and solved almost 60 years ago by Gnedenko (1937). 
Let us present his solution. 

Consider the function A(x) = 0 if |x| > 2/2 and A(x) = x, 1f |x| < 
2 Aey = Lo. h(x)h(a + t) dx, then the ratio 91(f) = c(t)/c(0) 
is a ch.f. An easy calculation shows that 

Lan b2n S) Tae re 
diff) =< 1-30 't+2a 3, if O<t<a 
0, if |t| > 7. 


Now introduce another function, say #9, as follows: 


oo(t) = di(t), if |t] <7 
do(t + 2) = do(t), if te R’. 


Let us show that 99 1s a ch.f. Obviously 97 is an even function 


with the Fourier expansion 


(1) 540 + S- a, cos nt. 


n=] 
A standard calculation shows that 
ao =U, an = 60-7 (n-7(1 + cosnm) + 4r-“n~“"(1 — cos ni )|, w=1,2,... 


Thus the series (1) converges uniformly, its coefficients are non- 
sae and their sum equals (0) = 1. Hence 


= [-.e" dF (x) for some d.f. F. This means that 92 is 
ey 
Therefore we have that 99(t) = @1(t) for € [-z, z] but not for all ¢ 
er! Ina similar way we can construct two ch.f.s ¢1 and #7 which 


coincide on the interval [—/, /] for large enough / but not for all ¢ € 
l 
IK 
Note finally that at the end of the Gnedenko’s paper we can find 
a very important remark made by A. Ya. Khintchine concerning 
the above result. Let Ff, and F’ be the d.f.s corresponding to 9 


and #7. The above reasoning implies the equality 

1 (t)o1(t) = d1(t)d2(t) for all t € R' 
which is equivalent to the relation 
(2) Fy + Fy = Fy * Fy. 


Equation (2) states that there exists a d.f. whose convolutions with 
two different d.f.s coincide. In other words, the convolution 
equality (2) does not in general imply that F’') = FP. 


8.2. Discrete and absolutely continuous distributions can 


have characteristic functions coinciding on the 
interval [—1, 1] 


Let X be ar.v. whose ch.f. @] is given by 


| 1—|t|, if |t}<1 
| Di (t) = | = 

) art) i otherwise. 

Obviously @1 1s absolutely integrable on r! and the density f of X 
1S 

a aa 1 —cosz 3 
f(z)=— | e-**d,(t) dt = ————,_ rE RR’. 
| ae re fe 


Consider now the r.v. Y where 


ag k= 0,41, £2,.... 


PY = 0)= 4, PY = (2k-1)n] = 


If 99 is the ch.f. of Y, then 


| | 1 4 Sycos(2k —1)zt 
p dn(t) = b+ Ay eGR et 


Let US show that 9] given by (1) equals #9 given by (2) for each 
t € [-1,1]. The function A(t) = |¢| has the following Fourier 
expansion: h(t) = 5a9 + >>, ancosnat, |t] <1, where 
dg = 1, a, = 2(cosna — 1)/(n?r?). For even n, a, = 0 and for odd 
n, that is for n = 2k — 1, we have a2;-) = —4/((2k — 1)?77). Now 
comparing @1(f) and 99(t) we conclude that 91(t) = 99 for each t € 


[-1,1]. Nevertheless 9, and 9 correspond to quite different 


distributions, one of which is absolutely continuous while the other 
is purely discrete. Note additionally that a1 (4)  99(?) for |t| > 1. 


8.3. The absolute value of a characteristic function is not 
necessarily a characteristic function 


If g is a ch.f., then it is of general interest to know whether |g] is 
also a ch.f. Consider the function 


g(t) = £(1+7e"), teER’. 


Obviously g 1s a ch.f. of a r.v. taking two values. We now want to 
know whether 


o(0)| = (lOO)? = (6() HO)" = 450 + Te-* + 7eity!/2 


a 
Sint 


is a ch.f. If the answer were positive then yw := |g] must be of the 
form 


w(t) = perer Ap (l= p)e't®2 


where 0 < p < I| and x], x9 are differentreal numbers. Comparing | 


y/| and | we see that p should satisfy the relations 


_ 
i 


p=(1-p)P =a, 2(l-p)= 


which are obviously incompatible. Hence |g] 1s not a ch.f. although 
@ IS. 


8.4. The ratio of two characteristic functions need not be 
a characteristic function 


Let 9] and 99 be ch.f.s. Is it true that the ratio g1/99 1s also a ch.f.? 


The answer is based on the following result (see Lukacs 1970). A 
necessary condition for a function, analytic in some neighbourhood 
of the origin, to be a ch.f., is that in either half-plane the singularity 
nearest to the real axis 1s located on the imaginary axis. Consider 
the following two functions 


w= [(0-) (sts) 0-9) 


cu 7 
b2(t) = (- =) ,tER 


cl 


where a > 6b > Q. One can check thatboth 9; and 99 are analytic 
ch.f.s. Furthermore, their quotient y(t) = 91(t)/92(f) satisfies some 
of the elementary properties of chs, namely 
w(—t) = w(t), |v(t)| < v(0) = 1 for all ¢ € rk! However, the 
condition in the result cited above is violated since yw has no 


singularity on the imaginary axis while it has a pair of conjugate 
complex poles +b — 


Therefore in general the ratio of two ch.f.s is not a ch.f. 


8.5. The factorization of a characteristic function into 
indecomposable factors may not be unique 


We shall give two examples concerning the discrete and the 
absolutely continuous case respectively. 


(i) The function g(t) = > a ek is the ch.f. of a discrete 
uniform distribution on ie cel {0,1, 2, 3, 4, 5}. Take the functions 


gift) = 3(lte** +e),  ge(t) = 


1 5(1 +e") 
dr(t) = 21 +e%+e2%), a(t) = 


2 ( 
ie + edit), 
Obviously we have 


It is easy to see that 9], 99, Wj, YW are all ch.f.s of some (discrete) 
distributions. Moreover, #3 and 9 correspond to two-point 
distributions and hence they are indecomposable (see Gnedenko 


and Kolmogorov 1954; Lukacs 1970). Thus it only remains to 
show that 9; and 9] are also indecomposable. Suppose that y1(t) = 


w11(t)wj2(t), where y, 1 and w19 are non-trivial factors. Clearly yw] 
corresponds to a distribution, say G,, concentrated at three points, 
QO, 1, 2 each with probability 4. However, the discontinuity points 
of G; are of the type x; + yx where x; and yz are discontinuity 
points of the distributions corresponding to the ch.f.s w;1 and yw 12 


respectively (see Lukacs 1970). 
Since G1 has three discontinuity points and w 11, wj2 are non- 


trivial, we conclude that 
1 (t) = pe’ + (1 — pye™™2,  do(t) = ge! + (1 — q)e™? 


where 0< p< 1,0<g < 1. But w(t) = yj 1()yv12(0) mmplies that p, 
g must satisfy the relations 


pq = (1—p)(1-¢@) = p(l—-g)+¢(1-p) = 5. 


Clearly this is not possible. 
We have therefore shown that yw] is indecomposable and, since 


01(t) = yw, (20), we conclude that 91 is also indecomposable. 


(ii) Consider now a uniform distribution over the interval (—1,1). 
The ch.f. 9 of this distribution 1s 


d(t) =t tsint, teR. 


Using the elementary formula f! sint= cos(t/2)(1/2) sin(t/2) we 
obtain 


o(t) =t-*sint = TI ct/25) G72") simile 2"). 
k=1 


Passing to the limit in n, as n — ©, we get the following well 
known representation: 


(1) g(t) =t *sint = 1] cos(t/2*). 
k=1 


Now it only remains for us to show that cos(t/2*) is an 
indecomposable ch.f. This is a consequence of the equality 
cos(t/2*) = L(eit/2" ale e it/2") which implies that cos(t/2*) is a ch.f. 
of a distribution concentrated at two points and hence it 1s 
indecomposable. 

Another factorization can be obtained by using the formula 


t—'sint = (¢/3)—' sin(t/3)[2 cos(2t/3) — 1)/3. 


In this case we have 


a) 


(2) o(t) = t~'sint = x (2 2 cos(2t/3) “a tT cos(t/3 - or), 


It follows from (2) that @ 1s a product of indecomposable factors 
and obviously (1) and (2) are different factorizations of the ch.f. a. 


8.6. An absolutely continuous distribution can have a 
characteristic function which is not absolutely 
integrable 


Let o be a ch.f. and F' be its d.f. Recall that if o is absolutely 


integrable on Ri, then F' 1s absolutely continuous and the density / 
= F" 1s the inverse Fourier transform of @ (see Feller 1971; Lukacs 
1970; Loéve 1977/1978). Let us now clarify if the converse 
statement holds. For this purpose we shall use the following 
theorem of G. Polya (see Lukacs 1970; Feller 1971). 


Let y(t), t € r! be a real-valued continuous function such that: 


GQ) w(O) = 1; G1) (—t) = y(t); (1) w(t) 1s convex for t > 0; (iv) 
limy—s00 Y(t) = O. Then y is a ch.f. of a distribution which is 


absolutely continuous. 
Take for example the following two functions: 


| 


uy (t) — 


1-|tl, if o<lél<4 


1 ieaNape: Hig ee 
, if t€R! and y2(t) =f aah, if |t| > 4 


1 + |e] 
According to the result cited above we conclude that yw; and wo are 
ch.f.s which correspond to absolutely continuous distributions. 
However, it 1s easy to check that yw, and w9 are not absolutely 
integrable. 

Finally, suppose X is a r.v. exponentially distributed, 
X ~ Exp(A). By definition X is absolutely continuous (its density 


1S re A, x > 0). However its ch.f. 1s equal to A/(A — it), and 


obviously this function is not absolutely integrable on RI, 


Therefore the absolute integrability condition for the ch.f. 1s 
sufficient but not necessary for the corresponding d.f. to be 
absolutely continuous. 


8.7. A discrete distribution without a first-order moment 
but with a differentiable characteristic function 


This and the next example are given to show that the existence of 
the derivative a(n)(0) for odd n does not necessarily imply that the 


moment m, = E[X”] exists. To see this, consider the r.v._X with 


c = l 7 
P(X = +7] = a, hh = Ze 2 ee ’ eal be ) ee : 
'  nélogn ‘n-logn 


n=2 


Then the ch.f. @ of X 1s 


| cos nt 
(1) d(t) = = 20) een (eR 


n? log n’ 


Since the partial sums of the series }*~_,. (sin nt) /n are uniformly 
bounded, the series 5°”, (sin nt)/(nlogn) obtained from (1) by 
differentiation is uniformly convergent (see Zygmund 1968). This 
implies the uniform differentiability of the ch.f. a(¢) for all t € R} 
In particular, if t = 0, 9'(0) = O but the expectation EX does not 


exist because the series }*~_, 1/(n log n) is divergent. 


8.8. An absolutely continuous distribution without 
expectation but with a differentiable characteristic 
function 


(i) Let_X be ar.v. with the following density: 


fla) = { if |x| <2 
"| ef (a*log|a|), if |a| > 2 


where c is a norming constant, 0 < c < o (the exact value is not 
essential). Since [> (x logx)~' da = oo, the expectation EX does 
not exist. Nevertheless we can ask whether the ch.f. a(t) of X is 
differentiable at t= 0. Since 


a a 
naeee cos tx 
Ot) =2¢ | “a5 H 
J ie log ea p 


is even, we can write the difference [1 — a(t)]/(2c) for t > 0 in the 
following way: 


elit. CxO 
1 — @(t) ‘1-—costz ~ 1—costz 
———— = ———dzx+ | — dz. 
2c Jo x* log x Jie v7 loge 


Obviously 1 — g(t) is a real-valued and non-negative function. For 


an arbitrary u € r! we have 0 < 1 — cosu < min {2, ur. This 
implies that | — a(t) is not greater than some constant multiplied by 
the function /(t) where 


ne) = | ~ +2] ———- 
» logs ijt U* loga 


However, since A(t) = O(-t/log t) = o(t) as t > 0, we find that 


g(t} =1+o0(t) as t- 0. 


Therefore the ch.f. a(t) is differentiable at t= 0 and @'(0) = 0. 
(ii) Let us extend case (i). Suppose now that _X is a r.v. with the 
following density (K is a norming constant): 


0, 
ma K/(«log |x|), i 


It can be shown that the ch.f. d(t) = [7 e* f(x) da, t € R’ is 


differentiable at t = 0 three times an e.g. 930) = = 0 (NA) does 
en exist at t = 0). However 

E||X|?] = fo. |x/8 f(x) da = ov, ie. mg, the third-order moment 
of X, does not exist. ae details see Rao 1984.) 


8.9. The convolution of two indecomposable 
distributions can even have a normal component 

Let Fy, Ff be d.f.s and 91, @ their ch.f.s respectively. If at least 
one of #1, #2 1s decomposable, then the convolution F') * F’ has a 
ch.f. @]99 which is also decomposable. If F'} and F> are both 
indecomposable, is it true that Fy * F> is indecomposable? 
Regardless of our intuition we shall show that fF’) * F’ can contain 
a decomposable component which in particular can be chosen to be 


normal. To see this,let us consider the d.f. F' with the following 
ch.f.: o(f) = C1 —pPyet! 2 te Ri. According to Linnik and 
Ostrovskii (1977) any ch.f. of the form (1 — b* 17) explict — b2 17/2] 
where b, c € Ri, b # 0, 1s indecomposable. So, g 1s 
indecomposable. Denote by y the ch.f. of the d.f. G := F' * F. Then 
wit) = 02(t) = (1 — 2)2e*. Write w(Z) in the form w(t) = v1 (Dyr() 
where 


w(t) = (1 — t?)? exp(—3t7/4), o(t) = exp(—t*/4). 


It is then not difficult to check that the integral 
{-.. o:(t) exp(—itx) dt is realvalued and non-negative for all x € 
r!. This implies that y; is a ch.f. of some distribution (it 1s not 
important which one). On the other hand, w> can be identified as 
the ch.f. of the normal distribution N(0, 4) since in general the 
normal distribution N(a, o*) has a ch.f. equal to exp 
iat — 507t7|,t ER. 

Hence the indecomposability property is not preserved under 
convolution. 

The same example considered above can be interpreted as 
follows. Let X1, X 2 be independent r.v.s with a common d.f. F. 


Then the sum X7 + X> has a d.f. G and, moreover, the following 
relation holds: 


Xi + X2 SV, +¥2 
where Y1, Y> are independent r.v.s such that Y; has a ch.f. yw, Yo ~ 
N(0, 4). 


8.10. Does the existence of all moments of a distribution 
guarantee the analyticity of its characteristic and 


moment generating functions? 
Let X be ar.v. with ch.f. o and m.g.f. M. Then if a(t) and M(z) are 
analytic functions for ¢ < ¢g or |z| < rg with tg > 0, 79 > 0, the r.v. X 
possesses moments of all orders. Thus we come to the question of 


whether the converse of the statement 1s true. 
Suppose Z is ar.v. with density 


Lf, if «<0 
f(x) _ 5 exp(—1/z), if x _ Q. 


Then we have m; = F(Z") = (2k+1)!, k = 0,1, 2,.. .,khence Z 
possesses moments of any order. For clarifying the properties of 
the ch.f. of Z we need the following result (see Laha and Rohatgi 
1979). The ch.f. @ of the r.v. X is analytic iff: (a) XY has moments 
my, of any order k, k => 1; (b) there exists a constant c > 0 such that 
mpl < kick for all k> 1. 

Since in our example mj; = (2k + 1)! we can easily find that 


afl. A fps l k l fk 
(rag |/RI)1/* = REED II ( . :) = (1 i =*)| " 


and clearly condition (b) in the above result is not satisfied. 
Therefore the ch.f. 9 of Z cannot be analytic. It follows that the 
m.g.f. Mf does not exist. Note that the last statement can be derived 
directly. Indeed, M(z) can be written in the form 


L ao 
od Ca al exp(zx — \/x) da. 


—! 


If ¢ > 0 1s small enough then for every z with 0 < z < € we have 
zr— s/t >on as x —+ oo, This implies _ that 
i exp(zx — ,/x) dx = oo. Therefore M(z) does not exist in spite 
of the fact that all moments of Z do exist. 


Finally, let us show a case which is an extension of the above 
example. Suppose U is ar.v. with density 


g(x) = c exp(—|z|7), «ER 


where 0 < y < 1 and c is a norming constant, 
ch := fT exp(—|z|7) dx. Then E[|U|K] < « for every k> 1, so U 
possesses moments of any order. Nevertheless, the ch.f. of U is not 
analytic and consequently the m.g.f. of U does not exist. 


SECTION 9. INFINITELY DIVISIBLE AND STABLE 
DISTRIBUTIONS 


Let XY be ar.v. with d.f. F and ch.f. 9. We say that_X, as well as F 
and g, are infinitely divisible if for each n > 1 there exist 1.1.d. r.v.s 
Xy]o-++s Xnn Such that 


a ey, 
or equivalently, if for a d.f. fF, and a ch.f. 9, 
P= heen hoe (Foy eid o= {eel 
Let us note the following properties. 


(i) A distribution F with bounded support is infinitely divisible 
iff 1t is degenerate. 
(1) The infinitely divisible ch.f. does not vanish. 
(1) The product of a finite number of infinitely divisible ch.f.s 
is a ch.f. which is again infinitely divisible. 
(iv) The r.v. X can be a limit of sums S,, = S77, Xnx iff X is 
infinitely divisible. 


Fundamental in this field is the following result (see Feller 1971; 


Chow and Teicher 1978; Shiryaev 1995). The r.v. X with ch.f. @ 1s 
infinitely divisible iff o admits the following canonical 
representation known as the Levy-Khintchine representation: 


[2p o(t) = exp <igt + et _ 71 — : iis dG (wu) 
: ) | — Ll + ut un 3 


where y € Ri, and G(x), x € Ri, is non-decreasing left-continuous 
function of bounded variation and G(—o) = 0. 

Now let us introduce another notion. The r.v._X, its d.f. F and its 
ch.f. @ are called stable if for every nm > 1 there exist constants a, 
and b, > 0 and independent r.v.s_X},..., X;, distributed like X such 
that 


d 
b,X +a, = X,+...+Xy 


or, equivalently, F (52 = [F(x)|*", or [d(t)]” = o(b, tein. 


The basic result concerning stable distributions 1s as follows (see 
Chow and Teicher 1978; Zolotarev 1986). The r.v._.X with ch.f. @ 1s 
stable iff @ admits the following canonical representation: 


| 7 
(2) o(t) = exp <¢ iyt — ejt|* |1+78—wit, a) 
It 


where ye R!,0<a<2, 6|< 1, c=0 and 


tan 47a, ifaFx~l 
wlt,a) =~ 2 - 
wh) bev log|t|, if a=1. 


Recall that (2) is also known as the Lévy-Khintchine 
representation. In particular, if y = 0, 6 = 0, we obtain the 
symmetric stable distributions. They have ch.f.s of the type exp(- 
c\t|") where c>0,0<a <2. 

A detailed investigation of the infinitely divisible distributions 


and the stable distributions can be found in the books by Gnedenko 
and Kolmogorov (1954), Lukacs (1970), Feller (1971), Linnik and 
Ostrovskii (1977), Loéve (1978), Chow and Teicher (1978) and 
Zolotarev (1986). 

The next examples illustrate different properties of infinitely 
divisible and stable distributions. Two examples deal with random 
vectors. 


9.1. A non-vanishing characteristic function which is not 
infinitely divisible 


Let the r.v. X with ch.f. o(A), t€ R!, be infinitely divisible. Then 0 
does not vanish. The example below shows that in general the 
converse is not true. 

Consider the discrete r.v. X which takes the values —1, 0, 1 with 
probabilities 2, +, 4 respectively. The ch.f. o of X is 


o(t) = ge" + Se? + Se” = 3(3 + cost). 


Obviously a(t) > O for all ¢ e€ Ri, so @ does not vanish. 
Nevertheless, X 1s not infinitely divisible. To see this, let us assume 
that _X can be written as 


Gi K 23 oe 


where X] and_X9 are 1.1.d. r.v.s. Since X has three possible values, 
it is clear that each of X7 and X> can take only two values, say a 
and b, a < b. Let PLX] =a] =p, PLX] = 6] = 1 -— p for some p, 0 < 
p < 1. Then X] + X takes the values 2a, a + b and 25 with 


probabilities p’, 2p{1— p) and (1 — py respectively. So it should 
be 


2a=-1, at+b=0, 2=1, p?=4, 2p(p+1)=2, (1—p)?=} 


which are clearly incompatible. Hence the representation (1) is not 
possible, implying that_X 1s not infinitely divisible. 


9.2. If |o| is an infinitely divisible characteristic function, 
this does not always imply that @ is also infinitely 
divisible 
Recall that if 9 1s an infinitely divisible ch.f. then its absolute value 
\a| is so. It is not so trivial that in general the converse statement is 
false. This was discovered by Gnedenko and Kolmogorov (1954) 
and we present here their example of a ch.f. 9 such that |o] 1s 
infinitely divisible, but g is not. Consider the function 


=b 1 ae 


— 1 — bet ’ 


te R! 


(1) p(t) = 


where 0 <a<b< 1. Obviously g is continuous, @(0) = 1 and 


1—6 
b(t) = —— jae (1 + ab) Sik | 


It follows that @ is the ch.f. of ar.v. X with 


| , ‘1 — b)(1 + ab)! 
P[X = —1] = cal vee SOUT = peows..... 


l-a 


k= 1, Ke hk 


(2) log o(t) = - [Ce =p a TY », 


We can also write log g(t) in its canonical form (see the 
introductory notes to this section; the Lévy-Khintchine formula) by 
taking y = )**_, (b*+(—1)*a*)/(k?+1) and G(x) to be a function 


ofbounded variation withjumps of size kb*/(k2 + 1) atx =k and 


(-1)*71 kak |(k2 + 1) atx =—k for k = 1, 2, .... However, G is not 
monotone, which automatically implies that @ cannot be infinitely 
divisible. 

Furthermore, the function 


L=—6b 1 +e" 


1) = a1 be® 


is also a ch.f. but not infinitely divisible. Our next step is to show 
that the function 


H(t) = |d(t)/? = o@)4(0) 


is infinitely divisible. Note that y 1s a ch.f. as a product of two 
ch.f.s. It is easy to write firstly log @(t) in the form (2) and then 
obtain log y(t), namely 


log a(t) = ae rs | k—1_k ith =) — [pA [i yh | 
og u(t) 2. ; jo°+(-1)"""a | (« us 7 |O sae | (« ) 


Thus in the Lévy-Khintchine formula for log y(t) we can take y = 0 
and G(x) to be a non-decreasing function withjumps of size k(k2 5 
1)! [bk i (-1)la*] at the points x = +k, k = 1, 2, .... Since this 
representation of log y(t) 1s unique, we conclude that the ch.f. y 1s 


infinitely divisible. Moreover |y| = (\y|2)2 is also infinitely 
divisible despite the fact that @ given by (1) is not. Another 
interesting observation is that the infinitely divisible ch.f. yw is the 
product of the two non-infinitely divisible ch.f.s @ and ¢. 


9.3. The product of two independent non-negative and 
infinitely divisible random variables is not always 
infinitely divisible 


(i) Define two independent r.v.s X and Y having values in the sets 


{0, 1, 2, 3,...$ and {1, 1 +c, 1+ 2c, 1 + 3c, ...} respectively where 
l<c< 3 The corresponding probabilities for XY and Y are {po, p1, 
p2,  .$ and iq], 41, 42,  «-$ Where p; > 0, 
Lp; — qj > 0), gj — al 

Consider the product Z = XY and suppose it 1s infinitely 
divisible. Then 


(1) ZL £ 4, + 42 


where Z; and Z> are 1.1.d. r.v.s. Evidently, the ‘first’ six possible 


values of Z are 0, 1, 2, 1 + c, 3, 1 + 2c. It follows that 0, 1 and 1 +c 
are among the values of Z] (and hence of Z>). But this implies that 


2+ cis a possible value of Z. Since 2 + c < 1 + 2c we get a 
contradiction. Consequently a relation similar to (1) 1s not possible. 
Thus Z cannot be infinitely divisible. 

Notice that X and Y take their values from different sets. The 
same answer concerning the non-infinite divisibility of the product 
XY can be obtained in the case of X and Y taking values in the same 
space. 


(ii) Let us exhibit now an example in which the reasoning 1s based 
on the following (see Katti 1967). Suppose {p,, n € No} is a 
distribution with po > O and p; > 0. Then {p,} 1s infinitely 
divisible iff the numbers 77, k = 0,1,..., defined by 


rk 


(2) (n a L)Pn41 = » PePn-ky T= 0, per 
k=0 


are all non-negative. 

Let US use this result to prove a new and not too well known 
statement: let € and 7 be independent r.v.s each having a Poisson 
distribution P(A). Then both ¢ and y are infinitely divisible, but 
the product_X = ¢y is not. 


Indeed, take n > 1 such that n + 1 1s a prime number. Then 


Pott = P(X =nt+l=P[fp=n+1]) _ 
= P[é=1yn=n+14+PlE=n4+1,n= 1] = 2dA"t"e */(n 4 III. 


The number n itself is even and hence n has at least two 
(integer) factorizations: n = 1 -n=2- (n/2). Therefore 


Dn = P[X = nj > 2- (A707 4/21) « (AM 20-4 /(n/2)!) = AMAT Ae 44 fn /2)I. 


Obviously pp = PLX = 0] =1-(1- eh)? > 0, pi= n2e7 2h > 0, and 
SO to = Pj/po > O. Further, suppose that 71, 79,..., “,—1 1n (2) are all 
non-negative. Let US check the sign of r,. We have 


TrPo < (X1+1)Pn41 —PoPn < 2A"t2e724 /n!l — roX™/ 242-2 /(n,/2)! 
= e—2A \n/2+2 [2A"/? /n! — ro/(n/2)!] 


Since 4 > 0 is fixed and I/n! goes to zero as n — oo faster than 
1/(n/2)!, we conclude that for sufficiently large n the number 7r,, 


becomes negative. This does not agree with the property in (2) that 
all r,, are non-negative. Hence the product y¢ of two independent 


Poisson r.v.s 1s not infinitely divisible. 


9.4. Infinitely divisible products of non-infinitely 
divisible random variables 


There are many examples of the following kind: 1f-X is a r.v. which 
is absolutely continuous and infinitely divisible and X], X> are 
independent copies of X, then the product Xj _X> 1s again infinitely 
divisible. 

As a first example take X ~ (0,1). Then X] X has a ch.f. equal 
to 1/(1+72)!/2 and hence X1_X9 is infinitely divisible. 

As a second example, take X ~ @(0,1), 1.e. X has a Cauchy 


density f(x) = 1/(x(1 + x*)), x € R!. If X] and X>5 are independent 
copies of X, it can be checked that the ch.f. of the product X] X is 
infinitely divisible. 

These and other examples (discussed by Rohatgi et al (1990)) 
lead to the following question. Suppose X7 and_X are independent 


copies of the absolutely continuous r.v._X. Suppose further that the 
product Y = X] X 1s infinitely divisible. Does this imply that X 
itself 1s infinitely divisible? 

Let Y be ar.v. distributed )y(0,1). Then there exists a r.v._X such 
that by taking two independent copies, X; and X 5, we obtain 


X,X,£y (for details see Groeneboom and Klaassen (1982)). 
Thus P[|¥| > x2] > (P[|X| > x])? which implies that 


PU A| >So] = (Ply |> ¢?])1/2 — O(e~* /4) as © —- Oo. 


Referring to the paper of Steutel (1973) for details we conclude 
that_X cannot be infinitely divisible. Hence the answer to the above 
question 1s negative. 


9.5. Every distribution without indecomposable 

components is infinitely divisible, but the converse is 

not true 
Following tradition, denote by /g the class of distributions which 
have no indecomposable components. Recall that F € J/g means 
that the ch.f. g of F cannot be represented in the form 9 = 9109 
where 9; and g9 are ch.f.s of non-degenerate distributions. 
Detailed study of the class Jp 1s due to A. Ya. Khintchine. In this 
connection see Linnik and Ostrovskii (1977) where among a 
variety of results, the following theorem is proved: the class /p 1s a 
subclass of the class of infinitely divisible distributions. 

Let us show that this inclusion is strong. Indeed, take the 


following ch.f.: 


l-a 


Toa? O<a<l, te R’. 
— de 


(1) ot) = 


The representation 


n= |] 


o(t) = exp/log(1 — a) — log(1 — ae’*)] = exp bs — (eit = » 


shows that 9 1s a limit of products of ch.f.s corresponding to 
Poisson distributions. Then (see Gnedenko and Kolmogorov 1954; 
Loeve 1977/1978) the ch.f. @ is infinitely divisible. 

Further, the identity Lf (1 = =) — [[,— ; (1 + aor i | | <] 
implies that 


oO 


o(t) = [] +a? ef) /( +a"). 
k—0 
Recall that (1 + a2* ef2'(1 + a2’) is the ch.f. of a distribution 


concentrated at two points, namely 0 and ok However, such a 
distribution 1s indecomposable (see Example 9.1). Hence the ch.f. 
@ defined by (1) is infinitely divisible but @ does not belong to the 
class Io. 


9.6. A non-infinitely divisible random vector with 
infinitely divisible subsets of its coordinates 


Let (X71, X2, X3) be a random vector and y(t], fo, 73), tf], 12, 13 € R! 


its ch.f.: 
w(t1, t2,t3) = Elexp[t(t1.X1 + toX_2 + t3X3)|I. 


The vector (X17, X27, X3) 1s said to be infinitely divisible if for each 


a> 0, w(t, to, t3) is again a ch.f. Obviously this notion can be 


introduced for random vectors in R! with n > 3. We confine 
ourselves to the three-dimensional case for simplicity. 

Let us note that if (X71, X9, X3) 1s infinitely divisible, then each 
subset of its coordinates X7, X 7, X9 is infinitely divisible. This 
follows easily from the properties of the usual one-dimensional 
infinitely divisible distributions. Thus it is natural to ask whether 
the converse statement is true. 

Consider two independent r.v.s_X and Y each N(0,1). Let 


Z,=X*, Zy=XY, 23 =Y". 


It is easy to check that each of Z], Zo, Z3 1s infinitely divisible. 
Moreover, any of the two-dimensional random vectors (Z 1, 27), 
(Z}, Z3) and (Z 9, Z3) is also infinitely divisible. However, the 
vector (Z1, Z7, Z3) 18 indecomposable, it has trivariate gamma 


distribution which is not infinitely divisible. For details we refer 
the reader to works by Levy (1948), Griffiths (1970) and Rao 
(1984). 


9.7. A non-infinitely divisible random vector with 
infinitely divisible linear combinations of its 
components 


If (X, Y) is an infinitely divisible random vector, then any linear 


combination Z = ajX + a,Y, aj, a2 € R! 


is an infinitely divisible 
r.v. The question to be considered 1s whether the converse is true. 
This problem, posed by C.R. Rao, has been solved by Ibragimov 
(1972). The following example shows that the answer is negative. 

For z = (41,22) € R*, |a| = (2? + 22)1/? andO<e< ;, define 
the function 


—eor ¢+e< |2| <1 


Let A,(U), Ue RZ, be the signed measure with density a,, that is 


Af) = | a-(x) dx 
U 
and also introduce the function 
(1) w(t) = exp if (e'2) _ 1) 4A.(0) , tec ER, (tc) =ty2, + toto. 


For all sufficiently small ¢ > 0, w, is positive definite and hence 


w, 1s the ch.f. of some d.f. F’, in 2. Indeed, from (1) 


Ox ] n 
a(t,x) ae 
Dia (Leteaacn)” 


m=] 


We(t} =c 


where © = exp|—A-(R°)].. Thus F ~ can be written in the form F,. = 
c(Gg + G,) where Go{(0)} = 1 is a probability measure with 
Go({0}) = 1 and Gg is a measure with density 


xO 


7 ~(M)/ i) fa 
Ye (x) = pa sar ag “(x)/nl. Furthermore we can check that for all 
small «, 


a) (x) = | ae(a—u)ae(u)du>0 and &®) (x) = 4 * a(x) > 0. 
JR? 


~(n—2) 


~(n) : ~(2)¢ 
Hence for n = 4 we have Ge (x) = Ge 7 ais (x) > 0. For 
. x(2)r,, 
small ¢, a , (x) is close to the function “o (x). It is easy to see that 


a. : 3 mg wh 
fora = © < Z| we have Mla 4% (z) = c > 0. Thus for small 


- gi4)¢, To | Tf oe ce he 
e, dz (x) > 22 if 3 —¢ < |r| S y+. Byidently this implies 


that Y-(") 2 9 for all x € R*. Therefore F, described above is a 
probability measure in R?.. 

Denote by (X], X27) a random vector with d.f. F,. Since A, is a 
signed measure (its values are not only positive), F,. cannot be 
infinitely divisible. 

It remains to be shown that any linear combination a,X] + 
aX, 01,07 € R’. has a distribution which is infinitely divisible. 
Indeed, for s eR’, 


ba(s) := Efexp[is(a,X, + a2X2)|} = ¥-(a18, 028) 


exp | (PAO) 01) A(x) : 
R? 


Denoting (2,2) = U where u € R! we can write ,(s) in the form 


ba(s) = exp | (Cae I) aHta(u) , Bhat = | dA-(a) du. 
vw — OO " (@,0) Su 

Since for sufficiently small ¢ every strip {U1 < (a, 2) < ug} 
has positive Ag-measure, we conclude (again see Ibragimov 1972) 
that the function H,(u), u R’, is a df. and moreover, 9,(s) = 
(04s, 078), s € R°. is a ch. of a distribution which is infinitely 
divisible. 

Thus we have established that any linear combination a1X] 
+aX5 1s an infinitely divisible r.v. but (X7, X) is not an infinitely 


divisible vector. 


9.8. Distributions which are infinitely divisible but not stable 


Usually we introduce and study the class of infinitely divisible 
distributions and then the class of stable distributions. One of the 
first observed properties 1s that every stable distribution 1s 


infinitely divisible. Let us show that the converse is not always 
true. 


(i) Let X be ar.v. with Poisson distribution, 4 ~ Po(A), that is 


Mer 


Th. 


Pix i] Se LP ccs 


where the parameter A > 0 1s given. If g 1s the ch.f. of X then 
(1) o(t) = exp[A(e* — 1)], te R’. 


Since (4) = [dn(t)]” for on(t) = exp[An~*(e” — 1)] and Py 1S 
again a ch.f. (of 20(A/”)) then_X is infinitely divisible. However, @ 
from (1) does not satisfy any relation of the type 
(bi t)b(bot) = o(bt)e'% (see the introductory notes). Hence the 
Poisson distribution is not stable despite the fact that it 1s infinitely 
divisible. 


(ii) Let Y be ar.v. with Laplace distribution, that 1s, its density g is 


where 1 € R*, > 0. For the ch.f. w of Y we have 
(2) u(t) =e /(14t77), tER’. 


It is not difficult to verify that y 1s infinitely divisible. But wy 
from (2) does not satisfy any relation of the type 
w(bit)y (bat) = p(bt)e'™ and hence y is not stable. 

Therefore the Laplace distribution is an example of an 
absolutely continuous distribution which is infinitely divisible 
without being stable. 


(iii) Suppose the gamma distributed r.v. Z has a density 


| oh 
i= 7g eee e> 
27 


(and g(x) = 0 for x <0). Then by using the explicit form of the ch.f. 
of Z we can show that Z is infinitely divisible but not stable. 

An additional example of a distribution which is infinitely 
divisible but not stable is given in Example 21.8. 


9.9. A stable distribution which can be decomposed into two 
infinitely divisible but not stable distributions 


Let_X be ar.v. with Cauchy distribution C(/, 0), that is, 1ts density 1s 
1 
I(t) = aay 


If @ denotes the ch.f. of X, then ?(¢) = elt ¢ R’. tt is well 
known that this distribution is stable. Let us show that X can be 
written as 


re R!. 


(1) xX2X14+X2 


where X, and X > are independent r.v.s whose distributions are 


infinitely divisible but not stable. For introduce the following two 
functions: 


oi(t) = exp[—|t]+1—e7'], de(t) = expfe™ — 1]. 
We claim that @, and @> are ch.f.s of distributions which are 
infinitely divisible. This follows from the fact that each of 91, 99 
can be expressed in the form °*PI- Jo Jo ve) dedu] with a suitable 
integrand w and the only assumption is that y is a ch.f. Then our 
conclusion concerning g1 and @ 1s a consequence of a result 


ofLukacs(1970,Th. 12.2.8). 
It is easy to verify that g; and @9 are not stable ch.f.s. Thus we 


have 


(2) o(t) =e!" = g(t) d2(t). 
Now take two independent r.v.s, say Xj and X9, whose ch.f.s are 
g 1 and @9 respectively. It only remains to see that (2) implies (1). 
Therefore we have constructed two r.v.s X, and X7 which are 


independent, both are infinitely divisible but not stable and they 
are such that the sum X 1 +_X5 has a stable distribution. 


SECTION 10. NORMAL DISTRIBUTION 


We say that the r.v._X has a normal distribution with parameters 


a and 3%, ac R',o> QO, if X 1s absolutely continuous and has a 
density 


l r — a)? 
(1) ie) = exp co  @ eR’. 


In such a case we use the notation & ~ N(@,0*). Tt is easy to write 
explicitly the d.f. corresponding to (1). 

Consider the particular case when a = 0, o = 1. We obtain the 
functions 


exp(—tu?)du, 2 ER’. 


(3) (2) = =f. 


These two functions, Y and ®, are called a standard normal density 


function, and a standard normal df. respectively. They correspond 
toar.v. (0,1), 
Recall that the r.v. X ~ N(a,0*) has EX = a, VX = o” anda 


ae ee ee. 
ch.f exp(iat — 50°t"). Tf g = 0, then all odd-order moments are 


zero, that is, Ma_41) = E[LX2"*! 
are m,) = E[X2] = o2"(2n - 1)!!. 

Consider now the random vector X = (X},..., X,). If EX] = aj, i= 
l,..., 1, then a = (@],..., Gy) 18 called a mean value vector (or vector 
of the expectation) of X. The matrix C = (cj) where Cij = E[(X; - 


aj)(X; — aj)], 1,7 = 1,...,n 1s called a covariance matrix of X. We say 


] = 0, while the even-order moments 


that_X has an n-dimensional normal distribution if _X possesses a 
density function 


(4) ice any Ln) — 


(an)? 


7 gu | | | | 
Di exp 5 >. dij fap 3 ai) (2; a al; ) ’ eae vn By) C R". 


ij=l 


Here the matrix D = (dj) 1s the inverse matrix to C. Clearly, D 
exists if C is positive definite and |D| :=detD. 

Note that we could start with the vector a = (d},..., dy) € IR" and 
the symmetric positive definite matrix C = (cj), then invert C to 
yield matrix D, and finally use the vector a and the matrix D to 
write the function f as in (4). This function f is an n-dimensional 
density and thus there is a random vector, say (X},..., X,), whose 
density is f. By definition this vector is called normally distributed, 
and (4) defines an n-dimensional normal density. 

For some of the examples below we need the explicit form of (4) 
when n = 2. The two-dimensional (or bivariate) normal density can 
be written as 


| z = z . ey — a), Nao — Ge 9 — &9 : 
Xx XP mr. 5 (ti = ai)" — oy (tt = a1)(t2 ~ a2) fi ( 2 ; 2) 
(1 — p’) O7 0109 age 


where ol, 69 > O and |p| < 1. If (X47, X92) 1s a random vector with 
density (5) then EX] = aj, EX) = a2, VX] = ae VX) = o> and p 
equals the correlation coefficient p(X7, X29). 

The normal distribution over R' and R" is considered in almost 
all textbooks and lecture notes. We refer the reader to the books by 
Anderson (1958), Parzen (1960), Gnedenko (1962), Papoulis 
(1965), Thomasian (1969), Feller (1971), Laha and Rohatgi 
(1979), Rao (1984), Shiryaev (1995) and Bauer (1996). 

In this section we have given various examples which clarify the 
properties of the normal distribution. 


10.1. Non-normal bivariate distributions with normal 
marginals 

(i) Take two independent r.v.s cz and €9, each distributed N(0,1). 

Consider the following two-dimensional random vector: 


ay qwee. J (Gilet, 1 cr 0 
An Xa) = (bts |€3|)5 af girs 0. 


Obviously the distribution of (X] , -X2) is not bivariate normal, 
but each of the components X7 and_X9 1s normally distributed. 


(11) Suppose A(x), x € R', is any odd continuous function vanishing 
outside the interval [—1,1] and satisfying the condition |h(x)| < 


| 


(2ne) '/2. Using the standard normal density ¥ 
function 


we define the 


(1) f(x,y) = g(x) ely) + h(a)h(y). 


It is easy to check that f (x, y), (x, y) € IR’, is a two-dimensional 
density function and f(x,y) 1s not bivariate normal, but the marginal 
densities f}(x) and f> (vy) are both normal. 


The function / 1n (1) can be chosen as follows: 
h(x) = (Qne)~/?2°T_ii(z), «ER 


where I; 1,1] (-) 1s the indicator function of the interval [—1,1]. 


(iii) For any number «, |e| < 1, define the function 


H (x,y) = (x) 6(y)[1 + e(1 — &(x))(1 — &(y))], (a, y) € R” 


(® is the standard normal d.f.). It is easy to check that AH 1s a two- 
dimensional d.f. with marginal distributions ®(x) and O(y) 
respectively. Obviously, if ¢ 4 0, H 1s non-normal. 

Another possibility is to take the function 


h(x, y) = o(a)yly)[1 + e(28(a) — 1)(28(y) — 1)]. 
Then /(x,y) is a two-dimensional density function with marginals ¥ 
(x) and ¥ (y) respectively, and A(x, y) 1s non-normal if ¢ # 0. 


(iv) Consider the following function: 
te,y) = ¢ U/L — ey) exp[-g0*(@" — 2pay + y")], if ay 2 0 
oa 0, if ry <0 


where p € (—1,1). It 1s easy to verify that f 1s a two-dimensional 
density function. Denote by (X,Y) the random vector whose density 
is f(x,y). Obviously the distribution of (X, Y) is not normal, but 
each of the components _X and Y 1s distributed N(0,1). 


10.2. If (X1,X2) has a bivariate normal distribution then X7, X2 


and X71 +X? are normally distributed, but not 
conversely 


Let (Xj, X2) have a bivariate normal distribution and f (x1, x9), 
(x1, x) € R* be its density. Then each of the r.v.s X1, X> and _X]+ 
X7 has a one-dimensional normal distribution. We are interested in 
whether the converse 1s true. 

Suppose X1, X7 are independent r.v.s each distributed (0,1). 
Then their joint density is f (11,22) = p(ri)p(t2) (Y is the 
standard normal density). The function f (xj, x2) which 1s 
symmetric in both arguments x1, x7 will be changed as follows. 


y* 


Figure 2 


Firstly, let us draw eight equal squares at a fixed distance from the 
origin O and located symmetrically about the axes Ox, and Ox) as 


shown in figure 2. Put alternately the signs (+) and (—) in the 
squares. Let the small positive number e denote the amount of 
‘mass’ which we transfer from a square with (—) to a square with 
(+). Now define the function 


te, if (x1,22) € Qt 
(1) g(t1,02) = 4 f(v1,%2)—e, if (1,22) € Q7 
) if (41,22) € QT UQ™ 


where O* is the union of the squares with (+) and O" the union of 
those with (—). 

For such squares we can choose ¢ > 0 sufficiently small such 
that g(x}, x2) > 0 for all (xj, x0) € R°-From (1) we find 
immediately that J Jig2 g(#1,%2)da1 diz = 1 Hence g is a density 
function of a two-dimensional random vector, say (Y], Y2). Next 
we want to find the distributions of Y, Yo and Y; + Y> ‘The strips 
drawn in figure 2 will help us to do this. These strips can be 
arbitrarily wide and arbitrarily located but parallel to Ox, or Ox2 or 
to the bisector of quadrants II and IV. Evidently the strips either do 
not intersect any of the squares, or each intersects just two of them, 
one signed by (+) and another by (—). Since the total mass in any 
strip remains unchanged and we know the distribution of (X1, X72) 
(recall it is normal), then we _ easily conclude that 
Y, ~ N(0, 1), Yo ~ N(O, 1), Yi + Yo ~ N(Q, 1). For example, look 
at the pairs of strips (51,51), (S2,55), ($3, 53). However it is clear 
that the distribution of (Y], Y) given by the density (1) is not 
bivariate normal. 

Therefore the normality of Yj, Yo and Y; + Y> is not enough to 
ensure that (Y1, Y>) is normally distributed. 


10.3. A non-normally distributed random vector such that any 
proper subset of its components consists of jointly 
normally distributed and mutually independent random 
variables 


We present here two examples based on different ideas. The first 
one is related to Example 7.3. 


(i) Let the r.v. X have a distribution N(a, 5”) and let f be its 
density. Take n r.v.s, say X1, ..., X,, n = 3, and define their joint 


density g, as follows: 


(1) 9n(@1,---,2n) = Te f (xi ) [ a i (az — a) f(x3)], 
| Mirocnnttay Sdn”. 


Firstly we have to check that g, is a probability density. Since in 
this case we know f explicitly, Se = (2n67) 1/2 exp[—(x — 
«.)7/267], we can easily find that Jo ,.(@ — a) f? (x) dx = 0. Then 
we can derive that g, 1s non-negative - its integral over R” is 1. 
Thus g,, given by (1), 1s a density and, as we accepted, g,, is the 
density of the vector (X,..., X7). 

Let us choose k of the variables X4,...,X), 2 <A <n — 1. Without 
loss of generality assume that _X],..., X, 1s our choice. Denote by gk 


the density function of (Xq,..., X,). From (1) we obtain 


(2) PO coca) 5 Viewed 
(recall that f is the density of X and X ~ N(a, o”)). Therefore the 
variables X], ..., X, are jointly normally distributed and, moreover, 


they are independent. This conclusion holds for all choices of k 
variables among X},..., X, and, let us repeat, 2 <k <n — 1. It 1s 
also clear that each Xj, j = 1,.... n has a normal distribution 
Nias), 

Therefore we have described a set of n r.v.s which, according to 
(1), are dependent but, as it follows from (2), are (n — 1)-wise 
independent. 


(ii) Let (X1,...., Xp) be an n-dimensional normally distributed 


random vector. Then its distribution is determined uniquely if we 
know the distributions of all pairs (4G, Xj), iy = I,..., n. This 
observation leads to the question of whether (X},..., X,) 1s 


necessarily normal if all the pairs (44, Xj) are two-dimensional 


normal vectors. We shall show that the joint normality of all pairs 
and even of all (m — 1)-tuples does not imply that (X4.,..., X;,) 1s 


normally distributed. (Look at case (1) above.) 
Firstly, let n > 3 and (¢],..., Cy) be such that cj = +1 and any 


particular sign vector (V,..., y,) 18 taken with probability p if 


L5=1 ¥ = +1 with probability g = 2-@-) — p if ini yw = —1 
Here 0 <p <2 ~~!) Tt is not difficult to see that all subsets of n — 
1 of the r.v.s ¢}j,..., cn are independent (that is, any n — 1 of them 
are mutually independent). Moreover, if p #2 ", all n variables are 
not mutually independent. Indeed, if | < k <n the vector (yj]...., 


Vik) can be extended in ga-k"! ways to a vector (y}..., ¥,) with 


I; =145 = | and in as many ways to one for which [T5213 = —1. 
Thus 


and this equality holds for any A <n — 1. Hence d1,... ,¢, are (n — 
l)-wise independent. Since P[c] = l,..., ¢, = 1] =p it 1s obvious 
that &1,..., &, are not independent when p #2 ". 

Now take Z}j,..., Z, to be nm mutually independent standard 
normal r.v.s which are independent of the vector (¢],..., ¢ )'Define 
a new vector (X],..., X;,) where X; = g/|Z;|, 7 = 1,..., n. Then clearly 
the X; are again standard normal. The independence of the Z; 


together with the above reasoning concerning the properties of the 
vector (¢j,.... ¢ ») Imply that all subsets of nm —1 of the variables 


X1,.... X, are independent. Thus any (n — 1)-tuple out of X4.,...,Xy 


has an (n — 1)-dimensional normal distribution. It remains for us to 
clarify whether all 1 variables X1,..., X, are independent. It is easy 


to see that 


We conclude from this that if p # 2 ™ then the variables X}.,..., _X,, 


are not independent and not normally distributed. 

Let us note finally that in both cases, (1) and (11), the joint 
normality and the mutual independence of any n — 1 of the 
variables X1,..., X, do not imply that the vector (Xi,..., X;,) 1s 


normally distributed. 


10.4. The relationship between two notions: normality and 
uncorrelatedness 


Let CX, Y) be a random vector with normal distribution. Recall that 
both X and Y are also normally distributed, and if X and Y are 
uncorrelated, they are independent. 

The examples below will show how important the normality of CX, 


Y) 1s. 
(i) Let X ~ N(0,1). For a fixed number c > 0 define the r.v. Y by 
vy — A, if |X| <e 
~ | =X, if |X| > e. 


It is easy to see that Y ~ N(O, 1) for each c. Further, 


E[XY] = E[X7I(|X| < c)] —E[X71(|X| > c)]. 
This implies that ELYY] = —-1 if c = 0 and ELXYY] — | asc > o~, 
Since ELXY] depends continuously on c, there exists cg for which 
P(X, Y) = ELXY] = 0. In fact, cg ~ 1.54 is the only solution of the 
Lay eee oe ey pte 
equation ELXY] = 4 Jo 2° (2) dx_ 1 = ¢ (¥ ig the standard normal 
density). For this cg the r.v.s X and Y are uncorrelated. However, 


P[X >c, y>c] =04P[X > c]P[Y > c] and hence X and y are not 
independent. 

(ii) Let Yi(x, y) and ¥2(x, y), (x, vy) € R* be standard bivariate 
normal densities with correlation coefficients p; and p9 
respectively. Define 


f(x,y) = cigi(z,y) + coye(z,y), (a,y) €R° 
where c1, c2 are arbitrary numbers, c}, c7 > 0, cy +c9 = 1. 

One can see that f is non-normal if p; # p2. If we denote by CX, 
Y) a random vector with density f then we can easily find that 
xX ~ N(O, 1), Y ~ N(0,1). Moreover, the correlation coefficient 
between X and Y is p= cy p] +c2p 2. Choosing cj, c2, P1, p2 such 
that cjp, + cop2 = 0, we obtain two normally distributed and 
uncorrelated r.v.s X and Y. However, they are not independent. 


(iii) Let CX, Y) be a two-dimensional random vector with density 


| | 27. , 2 
f(z,y) = ma {exp [—2 (2° + ary t+ y°)| + exp [—2(2° —xry ty ape 


(x,y) ER’. 
Obviously the distribution of (X,Y) is not bivariate normal. Direct 


calculation shows that ¥ ~ N(0,1), ¥ ~ N(O,1) and ELYy] = 
0. Thus_X and Y are uncorrelated but dependent. 


(iv) Let X = €; + ié) and Y= &3 + ié4 where? = V—1 and (€), €9, 
¢ 3, ¢4) 18 a normally distributed random vector with zero mean 
and covariance matrix 


z=, 0 0 | 


The reader can check that C 1s a covariance matrix. Since _X and Y 
are complex-valued, their covariance is 


E[XY] = E[fi3 + €o€4] + iE[foé3 — &&4] =0+i(-14+1) =0. 
Hence X and Y are uncorrelated. Let us see if they are independent. 
If so, then ¢z and ¢4 would be independent, and thus uncorrelated. 
But E[é1¢4] = — 1. This contradiction shows that X and Y are 
dependent. 


10.5. It is possible that _X,Y,X + Y,X — Y are each normally 
distributed, X and Y are uncorrelated, but CX, Y)is not 
bivariate normal 


Consider the following function: 


f(a,y) = — exp [3 (a? + y*)| {1 + xy(x2* — y*) exp [—3 (2° +y7 + 2e)| } 


27 


where (x,y) € R° and the constant ¢ > 0 is chosen in such a way 
that 


xy(a — y*) exp |—3 (2? + y? + 2e)| | =, 


In order to establish that f 1s a two-dimensional probability 
density function and then derive some other properties, it is best to 
find first the Fourier transform @ of f. We have 


ost) = | fl exp(isa + ity) f(x, y) dx dy 
} JR? 

= exp|—3(s° + t°)] + sst(s° —t°) exp |-e - F(s° + °)], (s,t) ER 
From this we deduce the following conclusions. 
(1) Since g (0, 0) = 1, f (x, y) 1s the density of a two-dimensional 
vector (X, Y). 
(2) $(t,0) = 6(0,t) = exp(—3t?) > X ~ N(0,1), ¥Y ~ N(0, 1). 
(3) O(t, t) = exp(—t*), thatis X + Y ~ N(0, 2). 
(4) X — y 1s also normally distributed. 


(5) X and Y are uncorrelated. 


However, the random vector (X, Y) as defined by the density f(x,y) 
is not bivariate normal despite the fact that properties (2) — (5) are 
satisfied. 


10.6. If the distribution of (X4,..., X;) 1s normal, then any linear 
combination and any subset of X}...., X, is normally 
distributed, but there is a converse statement which is 
not true 


This example can be considered as a natural continuation of 
Examples 10.2 and 10.5. Let us introduce the function 


Th Th Th 
: \—T jf ' *} ry. | ry 
(Qn)-"/4 | | Yo 2x) 1 + e(xf — 25) | | relay (ae) en; ; “) 


a 2 a ae ee ee 
where (A staes Xp) <R", yo(x) es exp(—52 ), 1-44) 1S the indicator 
function of the interval (—1,1) and the constant ¢ 1s chosen such 
that 


Th 5 rh 
: Live. 
e(x? — 23) I] t.1(—1,1)(@k) exp G ) 4) zi, 


k=1 t=! 
Under this condition we can check that f. is a density of some n- 
dimensional random vector, say (X},..., X,). Evidently the density 
f, defined by (1) is not normal. 


Now let us derive some statements for the distributions of the 
components of (Xj, ..., X;,). For this purpose we find the ch.f. g of 


f, explicitly, namely 


(2) o(ty, ao ae bat _ va (th) qe C (ty )e b(ty) = w(t uh(L2)) IT (tr, ) 


ba 1 
where 
w(t) = wal \(sint —tcost), if t 40 
if t= 0, 
B(t) = il +(6/)o@), if t 40 
0, f£i=0. 


From (2) one can draw the following conclusions. 

(a) Each of the components }, ..., X;, is distributed N(O,1). 

(b) For each k, k <n, the vector (4; ,..., 47,) 18 normally distributed. 
(c) If VU=X] + Xo and V is any linear combination of the variables 
X3,..., Xp, then U + V is normally distributed. 

(d) If ay,..., a, are real numbers such that a, # 0 for k = 1,..., n and 


|a1| # |az| then yo k=1 kXk ig not normally distributed. 
(e) El c=1 Xx] = 0. 


For the particular case n = 2 (which can be compared with 
Example 10.5) we obtain that: (a) = X] and X > have standard 


normal distribution; (c) =X, + X> and X;] — X> are normally 
distributed; (e) =X, and Xz are uncorrelated. However (X1, X7) is 


not normal, which follows from (d). 
Return again to the general case. Let U= X] + Xo, U] be a linear 


combination of any é of the variables X3, X4,..., X,, 0 < k <n-2, 


and y be a linear combination of the remaining n — k — 2 of these 
variables. Then the r.v.s X¥ = U + Uy, and Y are independent and 


normally distributed. Indeed, X and Y are uncorrelated and normal 
r.v.s and (c) implies that a countably infinite number of distinct 
linear combinations of them are distributed normally. 


10.7. The condition characterizing the normal distribution by 
normality of linear combinations cannot be weakened 


Let us start with the formulation of the following result (Hamedani 
and Tata 1975). Suppose {(a,,b;),k = 1,2,...$ 1s a countable 
‘distinct’? sequence in R* such that for each k, a,X + byY is a 


normal r.v. Then (X, Y) has a bivariate normal distribution. (Here 
‘distinct’ means that the parametric equations ¢) = axt, to = byt 


represent an infinite number of lines in R*.) 

We are now interested in whether the condition of this theorem 
can be weakened. More precisely, let_X and Y be r.v.s satisfying the 
following condition: 


(Cn) for given N pairs (a,, by), A= 1,..., N, V a fixed natural 
number, the linear combinations a,.X + by Y, k= 1,..., NW are 
normally distributed. 


The question is of whether (Cn) implies that CX, Y) has a bivariate 
normal distribution. To see this, consider the following function: 


N 
o(s,t) = exp [—5(s° +t *)] + exp |-¢ (s? + t*) 1 |T bis” — att? 


where s,t € R',e,cE RR". 

Firstly, we shall show that for a suitable choice of ¢ and c, 9 (s, 
t) is the ch.f. of some two-dimensional distribution. Indeed, 
denoting by f (x, y) the inverse Fourier transform of @ (s, t), we 
obtain: 


haan = (2a ag fA exp(—isa — ity)d(s, t) ds dt 


= (27)~1 exp |—5(a? + y”)| + hec(a, y), 


healt) i= =e "Ihe" exp(—isax — ity) exp |—3 c(s* +t? *)| 
x el ai Vase; 


Further, we need an estimate for the function A, which has just 


been introduced. It can be shown (Hamedani and Tata 1975) that 
for suitably chosen constants é¢ and c 


(1) he,c(2, y)| < (20)~t exp [—4 (x? + y?)] for alll (x, y) € R°. 

Since (s, #) is a continuous function, @ (0, 0) = 1 and f (x, y) 1s 
real-valued, we conclude that f and 0 is a pair of functions where f 
is a two-dimensional density and @ its corresponding ch.f. Denote 
by (¢, 7) the random vector whose density is f. Further, the 
definition of g immediately implies that 


(apt, bjt) = exp —5 (ay + bt helena, 


This means that the r.v. ay¢ + byy is normally distributed, 


N(0, aZ + bz.) for kh = 1,... N. However, @ itself is not the ch.f. of a 


bivariate normal distribution. 

Therefore we have constructed a pair of r.v.s, ¢ and 7, for which 
condition (Cn) is satisfied, but (¢, 7) is not normal. Thus (Cn) 1s 
not enough for normality of (¢, 7). It should be noted that condition 
(1) holds only if N is finite. 


10.8. Non-normal distributions such that all or some of the 
conditional distributions are normal 


(i) Let fix,y), (xv) € R* be a bivariate normal density. Then it is 
easy to check that each of the conditional densities fy(x|yv) and 


fo(v|x) 1s normal. This observation leads naturally to the question 


of whether the converse statement is true. We shall show that the 
answer 1S negative. 
Consider the following function: 


(1) g(x,y) = Cexp[-(1+27)(1+y7)], (x,y) € R’. 


Here C > 0 is a norming constant such that J Jee gla, y) dx dy = 1. 
A standard calculation shows that the conditional densities g1 
(x|v) and g9 (y|x) of g(x,y) are expressed as follows: 


gi(zly) = (270 
g2(y|v) = (2 
) 


where Ty = 1/(2(1+ 9") 
Obviously g1(x|yv) and g>(y|x) are normal densities 


TO 7)? exp(-2 uo idee i 

not)-¥/? exp(—y?/208) 

Go 1 (2a ae ER' yeR'. 
of N(0,02) 


and N(0, 07) respectively. However, g(x,y) given by (1) is not a 
two-dimensional normal density. 

Therefore the normality of the conditional densities does not 
imply that the two-dimensional density is normal. Let us note that 
similar properties hold for any density (non-normal) of the type 


g(a, y) = Cexp|- i‘ bisa" v| 


1,7=0 


(for details see Castillo and Galambos (1987, 1989)). One 
particular case of g is the function g given by (1). 


(ii) Consider now another interesting situation. Let ¢ be a r.v. 
distributed uniformly on the interval [0, 1] and 7], 12, 43, y4, 5, 


be independent r.v.s each with distribution N(9,1). Suppose 
additionally that ¢ and 7, = 1,... 5 are independent. Define the r.v.s 


= Vim + V1 mp, X2 = ens + V1 - &m, Xs = Vem t+ V1 - Ens 
It is then not difficult to check that each of X], X27, X3 has a 


standard normal distribution V(0, 1). Further, if @3)1,2(¢) denotes 
the conditional ch.f. of X3 given _X1, X2, then we find 


daj1.o(t) = Elet**|X, Xp] = E{Ble**|X1, Xo, é]} =e? 
and hence 3, conditionally on xX, and X 3, has a normal 


distribution (9, 1). So, given these properties we conjecture that 
the vector (X1, X2 , X3) has a trivariate normal distribution. Let us 


check whether or not this 1s correct. For this purpose we compute 
the joint ch.f. y(t],f) of X17 and_X as follows 


W(ty, to) = Ejexp(it, X + tt2X2)} = E{Elexp(tt,X, 4+ tteX)|E]} 
= Efexp/- stié ~ 5tsé — 5 (ty + t2)*(1 - €))} 
= exp[-9(t + ta)"|Elexp(tita’)] = (tit2)~" (e — Len a(t bay 


This form of the ch.f. w(t],to) shows that the distribution of (X17, 
X72) 1s not bivariate normal. Therefore the vector (Xj, x2, X3) 
cannot have a trivariate normal distribution despite the normality 


of each of the components X], X7, X3 and the conditional 
normality of X3 given X], X3. 


Note that under some additional assumptions the conditional 
normality of the components of a random vector will imply the 
normality of the vector itself (see Ahsanullah 1985; Ahsanullah 
and Sinha 1986; Bischoffand Fieger 1991; Hamedani 1992). 


10.9. Two random vectors with the same normal distribution 
can be obtained in different ways from independent 
standard normal random variables 


Recall that there are a few equivalent definitions of a multi-variate 
normal distribution. According to one of them a set of r.v.s_X}.,..., 
X, with zero means is said to have a multi-variate normal 
distribution if these variables are linear combinations of 
independent r.v.s, say ¢],..., Cjg Where ¢ j ~ N(O, 1).for j = ],..., M. 
That is, we have 


(1) = ee — Dcseaausc! 


Note that there is no restriction on M. It may be possible that M < 
N,M=NorM>N. 
Suppose we are given the r.v.s ¢]..., ¢m which are independent 


and distributed V(0, 1). Any fixed matrix (cj) generates by (1) a 
random vector with a multi-variate normal distribution. Then the 
natural question which arises is whether different matrices generate 
random vectors with different (multi-variate normal) distributions. 
To find the answer we need the following result (see Breiman 
1969): 1f both random vectors (X},..., XN) and (7,..., Yn) have a 


multi-variate normal distribution with the same mean and the same 


covariance matrix, then they have the same distribution. 
Now we shall use this result to answer the above question. 
According to (1) each of the vectors (X},..., Xn) and (7q,..., YN) 1s a 


transformation of independent N(0,1) r.v.s obtained by using a 
matrix. Thus the question to be considered is whether this matrix 1s 
unique for both vectors. By a simple example we can show that the 
answer 1s negative. 


Take cz and ¢) to be independent N(0, 1) r-v.s and let 


X | = £, + &9, Xq = 2&; + 9. 


Define also 
3 if 
Yi = V2, Yo = hi + ha. 


Thus we obtain two random vectors, (Xj, X27) and (V1, Y). It is 
easy to see that (Xj, X) has zero mean and covariance matrix 
(35) 

350 Further, (Y], Yo) has zero mean and the same covariance 


matrix. 
Moreover, both vectors, (X1, X7) and (Y], Y2) are multi-variate 


normal. By the above result, (X71, X2) and (Y], Yo) have the same 


distribution. However, as we have seen, these identically 
distributed vectors are obtained in quite different ways from 


independent -V(9, 1) rv.s. 


10.10. A property of a Gaussian system may hold even for 
discrete random variables 
A set of r.v.S {¢],..., Cys 18 said to be a Gaussian system if any of 


its subsets has a Gaussian (normal) distribution. Suppose for 
convenience that each ¢; has zero mean and denote Cj; = Cov(<, 


cj) =E Loicyl. Then for an arbitrary choice of four indices 7, 7, k, / 


(including any possible number of coincidences) the following 
relation holds: 


(1) BlE:E;EnEi| = CijCkl 1 CikCzl 1 CiCjk- 


Note that a similar property is satisfied also for a larger even 
number of variables chosen from the given Gaussian system 
(including coincidences of indices). To prove such a property it 1s 
enough to use the ch.f. of the random vector whose components 
are involved in the product. 

The above property has some useful applications but it 1s also of 
independent interest. It is natural to ask if this property holds for 
Gaussian systems only. If the answer were positive, then (1) would 
be a property characterizing the given system of r.v.s as Gaussian. 
It turns out, however, this is not the case. Here is a simple 
illustration. 

Consider the sequence 7], 7 9,... of 1.1.d. r.v.s such that 


Pl, = —V3) = Pim = V3) = 3, Pim = 9) = 5. 


It is easy to see that for all choices of indices (including possible 
coincidences) 


En, =0, Elnin;)=ci; and Elnin;n,| = 0, 


where Cj; = 1, if 7 = 7 and Cj; = 0, 7 # i. Direct calculations show 
that 


Elnninim| = 3, Elnningng|]=1 for 7 At and Elanjnem| = 0 
for all other choices of indices. All these facts taken together 
justify the following relation (compare with (1)): 


Eins Me = CigCkl 1 CikCzl + CyCik 


which 1s valid for arbitrary indices 1, /, k, 1. 
Hence (1) 1s satisfied for a collection of r.v.s which are far from 
being Gaussian. 


SECTION 11. THE MOMENT PROBLEM 


Let {mp = 1, mj, my,...} be a sequence of real numbers and / be a 
fixed interval, J c IR’ Suppose that {m,} are the moments of some 
d.f. F(x), x GJ, 1.e. 


—— / OR e)\, = Dy Tete, 
J] 


If Fis uniquely determined by the moment sequence {m,} we say 
that the moment problem has a unique solution or that it 1s 
determinate. Otherwise the moment problem has more than one 
solution or that it is indeterminate. We also say that the r.v. X ~ F 


is determinate or indeterminate. Note that the moment problem in 
the case J = [0, «) is called the Stieltjes moment problem, while in 
the case [ = (—00,00) we speak of the Hamburger moment problem. 

Below are some criteria for the moment problem to be 
determinate or indeterminate. 


Criterion (C1). Let F(x), x € R' be a df. whose ch-f. g(f), t € R’ 
is r-analytic for some r > 0. Then F' is uniquely determined by its 
moment sequence {m,} where '!'" — Jr, 2” dF'(z). Further, the 
ch.f. g 1s r-analytic for some r > 0 iff 


oe a4. \l/(2n) 
lim (mony = OO. 
n—+oo 2n 
This is equivalent to the existence of the m.g.f. 
M(t) = Efe’*], |é| < to, to > 0(Cramer condition). 
Criterion (C2). Let {mo = 1, mj,my.,...} be the moments of a d.f. 


F(x), x € R' and let 
iol.) i = oo (Carleman condition). 
Tt—1 


Then Fis uniquely determined by {m,}. If the d.f. / has as support 
the interval [0, 0) (instead of (—00,00)) then a sufficient condition 
for uniqueness is yin (Mn) 42") = o0, 

Criterion (C3). (a) Suppose the d.f. F(x), x € R’ is absolutely 


continuous with density f(x) > 0, x € R' and let F have moments of 
all orders. If 


————— dr < co (Krein condition) 


S — log f(x 
(2a) / 1 ai 7 


a xD 


then the distribution F' is indeterminate. 

(b) Let the d.f. F(z), € R* (F(0) = 0) be absolutely continuous 
with density f (x) > 0, x > 0 (Gf (x) = 0, x < 0) and let F have 
moments of all orders. If 


© — log f(z? 
(2b) / = log f(a") dz < oo (Krein condition) 
a are 


for some a = 0, then the distribution F 1s indeterminate. 
The proof of criteria (C;) and (Cz) can be found in Shohat and 


Tamarkin (1943). Criterion (C3) was suggested by Krein (1944) 
and discussed intensively by Akhiezer (1965) and Berg (1995). In 
these sources, as well as in Kendall and Stuart (1958), Feller 
(1971), Chow and Teicher (1978) and Shiryaev (1995), the reader 
will find discussions of these and other related topics. 

The examples in this section clarify the role of the conditions 
which guarantee the uniqueness of the moment problem and reveal 
the relationships between different sufficient conditions. 


11.1. The moment problem for powers of the normal 
distribution 


Iféisarv., &~ N(a,o*), then the distribution of € (the normal 


distribution) as well as that of a (7? -distribution) are uniquely 
determined by the corresponding moment sequences. These facts 
are well known but also they can be easily checked by, e.g. the 
Carleman criterion. Thus a reasonable question is: what can we say 
about higher powers of ¢? It turns out ‘the picture’ changes even 


for &. The first observation is that all moments E[(¢ 3) ky, A= 1, 


2,..., exist, however E[exp(t¢ 3)] exists only if ¢ = 0. Hence the 
m.g.f. does not exist, however we cannot conclude (see Criterion 


(Ci)) that the distribution of & is indeterminate. 


The case & allows us to make a more —— analysis. For let 


us take a rv. 7 ~ NO, 3) whose density is 27!/2 exp(-x’), x € Rt. 


Then the new r.v..X = n> has the following density: 


f(z) = (1/3Vr)|2|~2/3 exp(—|az|?/),  « € R’. 


By Se some standard integrals 
(fo (1+ 2?)"'da = 2/2, fo [(logx)/(1 + 
L oe = 0, [5° [2°/(1 + 2?)|) dx = m/2cos(dr/2), -1 < 6 < 1) 


we can easily ether that 


[- [—log f(x) /(1 + 2*)] dz < oo. 


Hence, according to the Krein criterion (C3), the distribution of the 


rv. X = n° is not determined uniquely by the moment sequence 


{my = E[X*], k = 1, 2,...}. 

In a similar way we can show that the moment problem is 
indeterminate for the distribution of any r.v. yond n=1,2,... 

Let us return to the r.v. X = n> . Knowing that the distribution of 
n> is indeterminate, we should like to describe explicitly another 


r.v. with the same moments as those of n>, One possible way to do 
this is to consider the following function: 


FAC) Sa) {1 +e cos(v3|x|?/*) - v3sin( V3 la\*/*)| , £eER. 


rz (ae es »\ 7 Y 4 ae 
It can be shown that for © © |~3> 3), fe(a),@ € R *, 1s a probability 


density function. Denote by X, a r.v. having f, as its density. 
Obviously f, # f, except for the trivial case « = 0. Our further 
reasoning 1s based on the equality 


| 2* f(2)[cos(V3)a|2/2) — V3sin(V3|a|2/)| de =0, k= 1,2,.. 
This immediately implies that 


E(X*]=E[X"], k=1,2,... 

despite the fact that X, and X are r.v.s with different distributions 

" I? 

since for their densities one holds: fe # /+€ € |-3:3 
0). 

It is interesting (and even curious) to note that the distribution of 

the absolute value |X] is determinate! Indeed, the r.v. |X] = |y 3. 

L OE i 2/ 
where 7 ~ N(0,5)-, has a density (2/3.V7)x 2/3 exp(—27/°) for x 
> 0 (and 0 for x <0). Then for the moment m; = E[LX)*] of order k, 


kA = 1, 2,..., we have tk = (1//m)P((3k + 1)/2). For large k, mr 
—1/(2k) — 


es | 


(except ¢ = 


ox ‘ ; . 
~ ck*/*, ¢ = constant, implying that d= (Mk) eee oe 
the Carleman condition is satisfied. Therefore in this case the 


moment problem is determinate. 


11.2. The lognormal distribution and the moment problem 


Let X be ar.v. such that log XY ~ N(0,1). In this case we say that _X 
has a (standard) /ognormal distribution. The density f of X 1s given 
by 


7 In) V/2z-1 exp[—L(logxr)?], if z>0 
| af (log) 
() F(x) ff if ¢ <0. 


The moments m, = E[X”], can be calculated explicitly, namely 


me 
n- /2 


Mn = »n= 1. It 1s easy to check that the moments {m,} do 


- a eo: ae 
not satisfy the Carleman condition yinzi(Mn)~“P") = 00. Since 


this condition is only sufficient, we cannot say whether the 
sequence {m,} determines uniquely the d.f. F of X. Further, we 


have the following relations: 


i 2-1 k o 2 ki 
mov ec |e zy\—1),,|/K,u A, 
vee cel cae Cre — J b ‘ fb 
| (1+2°)~*|log 2 ar | (1+e%)"ly)"e! dy 
0) 


— OO 
a() 100 
< | y| "ee ay | ly|*e’ dy < oo, k=0,1,2,... 
J =00 JQ) 
From this we conclude that the density (1) does satisfy the Krein 
condition (2b). According to Criterion (C3) the lognormal 


distribution is not determined uniquely by its moments. 
Alternatively, the same conclusion is derived by referring to 


Criterion (C1) after showing that E[e] = 00 for each ¢t > 0 meaning 


that the m.g.f. of X does not exist. 

Thus we come to the following interesting question: is it 
possible to find explicitly other distributions with the same 
moments as those of the lognormal distribution? Two possible 
answers are given below. 


(i) Let {f,(x), x € Rice [-1, 1]} be a family of functions defined 


as. 


. .- \ | f(x)[l+esin(2Qzlogz)|, if r>0 
(2) f(e)= 48 if «<0 


where f is given by (1). Obviously, /,(x) = 0 for all x € r! and any 
¢ € [—l1, 1]. In order to establish other properties of f,, we have to 
prove that 


(3) = | x* f(x) sin(2alogx)dz=0, k=0,1,2,.... 
JO 


Indeed, by the substitution log x = u = y + k we reduce JJ; to the 
integrals 


1 OK | 
| ny x ¥ P 
| exp Gu + a sin(27u) du 


OO 
1 Le [ ( l j | 
= ——exp | -k° exp {—~=y" | sin(27y) dy. 
was (5) [en (-5y?) sacra 


The last integral is zero since the integrand is an odd function 
and the interval (—oo, 00) is symmetric with respect to 0. 

So, based on (3) we draw the following conclusions concerning 
the family (2). If k = 0 then for any ¢ € [—1, 1], fe(x), x € Rr! isa 
probability density function of some r.v., say X,. Obviously, if ¢ = 


al- 
=] 


0, f. and X, are respectively f and X defined at the beginning of 
this section. Moreover, we have 


E[|X°] =E[X"] foranyk, &=1,2,... 
despite the fact that f. # f fore #0. 


Therefore we have described explicitly the family {X,} 
containing infinitely many absolutely continuous r.v.s having the 
same moments as those of the r.v. X with lognormal distribution. 
This example, after the paper by Hey de (1963a) appeared, became 
one of the most popular examples illustrating the classical moment 
problem. 

(ii) Now we shall exhibit another family of d.fis {H,, a > 0} 


having the same moments as the lognormal distribution (1). Let us 
announce a priori that H,, for each a > 0, will correspond to some 


discrete r.v. Yo. 
For a > 0 consider the function 


Oo 
4 ae a "e—-”'/? exp(iae"t), t € R’. 
| I 


r= — 


It is easy to see that the series in (4) is convergent for all t € r! and 


all a > 0. Moreover, the functions /,(t), t € r! are continuous and 


4.9 —_ p! vy) 
positive definite in the standard sense: Lk Zjapldg (ty — te) 2 0, bythe ER, 25, 25 


are complex numbers. By the Bochner theorem (see Feller 1971) 
the function 


Wa(t) = ha(t)/ha(0), tEeR 
is a ch.f. Denote respectively by H, and Y, the d.f. and the r.v. 
whose ch.f. is Wg. The explicit form (4) of the function hg allows 
us to describe explicitly the r.v. Y,. We have 


PY, = ae"] := p,(k) = ae * /2 fh, (Ol, eset as 


The next step ‘ to find the moments 
Mn = EY?) = Yypa—oo(ae")"palk), n = 1,2, ... Since 


A ae 
(=) ha(t) 


we can easily obtain 


t=0 k=-00 


er (2H = 47” so K (iae®)\ Pe +k") 2 1h (0) 
ke 


= S- a~(k-n)e—(k=n)?/2 hy (0) = 1. 


It follows from this that 


E[Y"] = in =e” /2 =E[X", n=1,2,... 
Therefore there is an infinite family {Y,} of discrete r.v.s such 


that Y, has the same moments as the r.v. X with lognormal 


distribution. We refer the reader to papers by Leipnik (1981) and 
Pakes and Khattree (1992) for more comments about this example 
and other related topics. 


11.3. The moment problem for powers of an exponential 
distribution 


Let € be ar.v., € ~ Exp(1). The density of € is e * for x > 0 and 0 
for x <0, so the moment of order k is mj, = E[E* Ks Ke Ne oe 


The distribution of €, 1.e. the exponential distribution is uniquely 
determined by the moment sequence {mz}. This follows from the 


Carleman criterion as well as from the existence of the m.g.f. of € 
and referring to Criterion (C7). 


Now we want to clarify whether powers of ¢ have determinate 


distributions. For let 6 > 0 and let X = 2. If f is the density of X, 
we easily find that 


f(x) = (1/d)a!/?-l exp(—a2!/°), if x >0; f(x) =0 for «<0. 
The moments of X exist and ELX*] = T(0k + 1), k = 1, 2, .... We 


use the density f and find that Jo [—(og fiz*))/(+@* )| dz <sce 


iff 0 > 2. Hence for 0 > 2 the distribution of the r.v. X = co 1S 
indeterminate. 


Let us show that in this case, fa with 0 > 2, there 1s a family of 


r.v.s all having the same moments as those of c0. Indeed, consider 
the function: 


fe(x) = f(a){1 + eleos(csa/*) — (1/cs) sin(csx!/*)]}, 2 > 0 
where cs = tan(z/0) and |e| < sin(z/0). The equality 


| r* f (x) [cos(csx'/°) — (1/c;s) sin(csx1/°)] =). BS sas 
0 


shows that f, is a probability density function of ar.v. X, and 


THB Gon og a es 


even though f, # f (except the trivial case ¢ = 0). 

For completeness we have to consider the case 0 € (0, 2]. Since 
m= E[(@)*] = 1 (0k + 1), we can use the properties of the gamma 
function I’(-) and show that the Carleman condition 1s satisfied. 
Thus we conclude that if € ~ Exp(1) and 0 < o < 2, then the 


distribution of eo is uniquely determined by its moment sequence 
(also see Example 11.4). 


11.4. A class of hyper-exponential distributions with an 
indeterminate moment problem 


Recall first that the one-sided Ayper-exponential distribution Ht 
(a, b, c), where a, b, c are positive numbers, 1s given by the density 
function 

ch-a/e 


a— | . ci 7 
| Se xp(—ax~/b), Ww ar>O 
(1) f(x) = T(a/c) exp(—2x°/b), if «> 
0, if oe =U. 


(Notice that the gamma distribution y(a, 5), and the exponential 
distribution Exp(A) are special cases of the hyper-exponential 
distributions.) 

It can be shown (for details see Hoffmann-Jorgensen 1994) that 
if Xisar.v.,X~H*(a, b, c), then the quantity ELX*] does exist for 
any k => 0 (not just for integer 4) and, moreover, 


ke : 
E[X*] = ser ( —**) jr (*). 

C 
Hence the r.v. X has finite moments mj,= ELX*), k = 1, 2, ..., and 
the question is whether or not the moment sequence {m;} 


determines uniquely the distribution of X. 


il 
Let us take some a > 0, b > 0 and 0 <c < 2. 


Then we can 


choose p > 0 such that 7 := a+ p is an integer number, set r = p/c, 
4 =r-+ 1/6, s = tan(cz) and introduce the function 


wu) = uP exp(—ru*) sin(Asu®), wu > 0. 


Since e * <r’x ° for all x > 0, we easily see that |y(u)| <1, u > 0. 
Let & be a fixed non-negative integer. Then n = k + 7 is an integer, 
v= cm € (0,5) and substituting x = u© yields 


2) *. 2) 
C | u’t4—1u(u) exp(—u°/b) du = | gl? /)—1 a>” sin(\sx) dx = 0. 


0 0) 


This implies that for any non-negative integer & and any real ¢ the 
following relation holds: 


/ uk Fu dee | yl f(u) du 
J0 


0 
where f is the hyper-exponential density (1) and 
f(a) -{{@U+ev@)}, if x>0 
eee "1G. i ge U, 
Since |y(x) |< 1, x > 0, it is easy to see that for any ¢ € [—1, 1], f, 1s 
a probability density function of ar.v. X, and 


E[X*])=E[X*), k=1,2,... 
despite the fact that f,. # f (except the trivial case ¢ = 0). 
i 
Therefore for a > 0, b > 0 and c € (0, 2) the hyper-exponential 


distribution H{* (a, b, c) is not determined uniquely by its moment 
1 
sequence. It can be shown, however, that for a > 0,b > 0 andc =>2 


, the moment problem for ‘HH (a, b, c) has a unique solution (Berg 
1988). 
Since for the exponential distribution Exp(A), the gamma 


distribution y(a, b) and the hyper-exponential distribution H™ (a, b, 
c) we have 


Exp(A) = 7(1,1/A), (a,b) = Ht (a, b, 1) 
it follows that if. XY ~ Exp(A) or X ~ y(0, 6), where 4 > 0, a > 0 and b 


> 0 are arbitrary, then the distribution of X° for 5 > 2 is not 


determined uniquely by the corresponding moment sequence. 
(Also see Example 11.3.) 


11.5. Different distributions with equal absolute values of the 
characteristic functions and the same moments of all 
orders 


Let us start with the number sequence {a;,,k = 1, 2, ...} where az > 


O and @ = dip—1% < 8. We shall consider a special sequence 
of distributions and study the corresponding sequence of the ch.f-s. 
The uniform distribution on [—a;, az] has a ch.f. sin(a;zt) / (azt). 


Denote by f7(x), x € [—2a;, 2a;] the convolution of this distribution 
with itself. Then the ch.f. 0; of f, 1s O;(t) = [sin(azt)/(a;2) |? k= 


! jes o4(0) 
2,....¢¢€R°. The product 447=1 "7 \"’ converges, as m — ©, to the 
function (ft) where 


(1) g(t) = TI (0). te R'. 


Using the Taylor expansion of sin x for small x, we find that 


S 


p(t) & exp (- ; ai*) ast—+0 
a2 
5 


where @ = ae =] Therefore according to Lukacs (1970, Th. 


3.6.1), O(t), t € r! is a chf. The df. corresponding to @ is 
absolutely continuous. a its density be denoted by f. Clearly f is 


an infinite convolution: that is, f =f] * fo * ... By the inversion 
formula (Feller 1971; Lukacs 1970; Shiryaev 1995) we find that ” 
Since O(t) => 0, for all ¢ € rR! we can construct the density fo by 


setting fo(x) = (2nf (0)) 1o(x), x € RI. If do denotes the ch-f. of fy 
then 


@o(t) Pe. exp(ita) fo(a) da 
(Qn f(0))~+ [2 exp(—itx) d(x) dx = f(t)/f(0). 


Note especially that the support of Mp is contained in the interval 


[—2a, 2a]. Using the function dp and the function @ from (1) we 
define the following four functions: 


uni (t) = do(t) + 5(bo(t + 4a) + do(t — 4a)], 
uro(t) = do(t) — — 4[do(t + 4a) + do(t — 4a)], 
gi(z) = (2nf (0))~* d(x) (1 + cos 4az), 
g(x) = (20 f(0))~*o(x)(1 — cos 4az). 


The above reasoning shows that g](x), x € r! and 29(x), x € Rr! 


are probability density functions. Moreover, yw; and yo given by 


(2) and (3) are just their ch.f.s. Also |w1(0| = |wo(| for each t € rh 

Denote by_X] and_X r.v.s with densities g; and go respectively. 
if cn =7F(0) p14 then it is easy to derive from (4) that 
lZ1(x)| < cy |x| 2" for each n EN. The same estimation holds for go. 
Hence both variables X71 and _X7 possess moments of all orders. As 
a consequence we obtain (see Feller 1971) that the ch.f.s yw; and w9 
have derivatives of all orders, and, since |w1(2)| = |wo()|, wy) = 
w(t) for ¢ in a small neighbourhood of 0. This implies that the 
moments of X7 and_X of all orders coincide. Looking again at the 


pairs (g1, 22), (Wj, w2) and (X1, X2) we conclude that |y1(d| = | 
w>(t)| for all t eR’, EX? = E[X}| for each k € N but 
nevertheless g] # go. 


11.6. Another class of absolutely continuous distributions 
which are not determined uniquely by their moments 


Consider the r.v._X with the following density: 


‘ err. Bae see 
f(z)=<° exp(—az’), if x > 0 
2 0), te a =. 0), 


1 
Here a > 0,0 <A<2 and c 1s a norming constant. 
For ¢ € (—1, 1) and 6 = a tan Ax define the function f, by 


f.(z) = 4 © exp(—aa*)(1+¢ sin(@x*)), if «>0 
= 0, i a s_, 


Obviously f,(x) => 0, x € r!. Next we shall use the relation 


(1) / z” exp(—az*) sin(B2*) dz = 0. 
0 


Let us establish the validity of (1). If p > 0 and qg is a complex 
number with **¢ > 9, then we use the well known identity 


[ pP—1.—a dit = T(p)/q?. 
C 


Denoting p= (n + 1), gq =atib,t= rae we find 


es) GO 
| gD exn[—(a + ib)a*)Aa* | da =A | x” exp(—(a + ib)a*] da 
0 0 


Te 2 


XD 
=A | x” exp(—ax”) cos(ba*) da — id | x” exp(—az’) sin(ba*) dx 
J0 Jo 
= P((n+ 1)/)) Cima 6 +i tan yar) : 
The last ratio is real-valued because sin [a(n + 1)] = 0 and 
(1+ tanAm)@tD/A = (cos dAm)~ YA (cos Am +i sin Am) tD/ 
= (Cos An) er Virgiiat)) 
= (005 harry cos 7(n + 1). 
Thus (1) is proved. Taking n = 0 we see that f, 1s a probability 


density function. Denote by X, a r.v. whose density is f,. The 
relationship between f, and f, together with (1), imply that 


EX? |=E|X") foreach — 1,2... . 
Therefore we have constructed infinitely many r.v.s_X, with the 
same moments as those of X though their densities f,. and f are 
different (/. = f only 1f ¢ = 0). So in this case the moment problem 


is indeterminate. However, this fact is not surprising because the 
density f does satisfy criterion (C3). 


11.7. Two different discrete distributions on a subset of natural 
numbers both having the same moments of all orders 


Let q = 2 be a fixed natural number and My = {qd :j7 =0, 1, 2, ...}. 
Clearly Mg CN for q = 2, 3, ... Ifn € Mg then n has the form q/ 


and we can define p, by p, = e %q//j!. It is easy to see that {p,} is 
a discrete probability distribution over the set My. Denote by X a 


r.v. with values in Mz and a distribution {p,}. In this case we say 


that X has a log-Poisson distribution. Then the Ath-order moment 
my, of X 1s 


OD 


my = E[X*] = > e~ 4g*I gi) /j! = explq(q” — 1)] < co 
j=0 


Our purpose now is to construct many other es with the same 

. . a oe ( op . 1 

moments. Consider the function (2) = [],=1(! — 2@~™")- Since 1: 
< o for g > 1 then A(z) is an analytic function in the whole 


complex plane. Let /(#) = i=0 62" be its Taylor expansion 
around 0. Taking into account the equality h(gz) = (1 — z)h(z) we 


have the relation cjy/cj_1 = —(q = 1)! where cg = | and for 7 > 1 
we find 
ej = (-1)[(¢- 1)(q? -1)...(@ -D)*. 


Setting aj = j!cj we see that |a;| < | for all 7. This implies that 


(1) oye “a.@ /j!=e Ih(g*t!) =0 forall k=0,1,2,. 


ED oe AT) 
Now introduce the number set {pn ,neM q} where 


Pn) ‘= Dn(1 + a5) 

= 4g [(j!)* + e(-1)((g—- 1(@? - 1)... - 1)", n=@. 
Here € is any number in the interval [—1, 1]. Obviously pn? >0 
and (1) implies that dune, pn = 1 f°r any ¢ € [— 1, 1]. Therefore 
fon} defines a discrete probability distribution over the set Mo. 
Let X, be a r.v. with values in Mg and a distribution {pi a Using 


(1) again we conclude that E[Xf] = E[X"] for each k = 1,2, ... and 
e€[-l, 1]. 

So, excluding the trivial case ¢ = 0, we have constructed discrete 
r.v.s X and X, whose distributions are different but whose moments 


of all orders coincide. 


11.8. Another family of discrete distributions with the same 
moments of all orders 


Let N= {0,+1,+2,...} and ¥ be a rv. with the following 
distribution: 


P[X = e®"] =c er neEN, ct=Y*e™. 


0 
rh 


Here and below 2, © ” is the sum over all n € N. For any 
positive integer k we can calculate explicitly the moment m; = 


ELX*] of order k, namely 


Ma = do *eBknog—n” — _l6k* >, *eexp[—(n + 4k)?] = elGk* 


Now we shall construct a family consisting of infinitely many 
r.v.s with the same moments. 
For any é € (0, 1) define the function 


0 if n=O (mod 4) 

_ je if n=1 (mod 4) 
h.(n) = () if n=2 (mod 4) 
—e, if n=3 (mod 4). 


In the sequel we use the evident properties: for any fixed «, h,(n) 
is an odd function of n, that is h,g(—n) = —h,(n); h,(n) 1s a periodic 
function of period equal to 4; h, (n + 4k) is an odd function in n for 


each integer k. 
The next crucial step is to evaluate the sum S; where 


[A = ys * _Bkn p | (n)jce—™ , k = 0, iF Zi oe te 
We have 


Ss ke 


cY~*h-(n) exp(8kn — n7) 
= cexp{16k7) }~ *h,(n) exp[—(n — 4k)?] 


= cexp(16k7) S~ *h.(u + 4k) exp(—u?). 

The last sum is zero because h,(u+4k) is an odd function of u for 
all k =0, 1, 2, ... Thus we have established that 

(1) S,=0 forallk =0,1,2,... andalle € (0,1). 


As a consequence of (1) we derive that 


g-(n) = ce~” (1—h,(n)) > 0 


for each n € N and any « € (0, 1) and moreover 2 de (n) = 1. 
This means that the set of numbers {q,(n), nm © N} can be 


regarded as a discrete probability distribution of some r.v. which 
will be denoted by X,. 


Thus we have constructed a r.v. X, whose values are the same as 
those of X but whose distribution is 


P[X. = e8"] = ce” (1 — A (n)). 
Since (1) is satisfied for any k = 1, 2, ... we find that 
E[X*] = E[X*] = exp(16k7?). 


d 
Therefore for any ¢ € (0, 1) we have *< # X but nevertheless 
X, and X have the same moments of all orders. 


11.9. On the relationship between two sufficient conditions for 
the determination of the moment problem 


(i) Let Z = X log(| + Y) where X and Y are independent r.v.s each 
distributed exponentially with parameter 1. Obviously Z 1s 
absolutely continuous and takes non-negative values. The nth 
moment m, of Z is 


i 
M,=nlv, with vz = / llog(1 + x)|"e"* dz. 
0 


It can be shown (the details are left to the reader) that 
(1) e~* log(1 +n) < (et " <clog(1+n), c=constant 
Thus 


(Filipe nt)i/ 4 (i) et log(l+n)—-4 oo asn—-oo 


which implies that the series dina Mnt"/N! does not converge for 
any ¢t # 0. Therefore we cannot apply Criterion (C1) (see the 
introductory notes to this section) to decide whether the d.f. of Z 1s 
determined by its moments. 

From (1) we obtain that for large n, 


( Mn) L/n sao ‘n( Un) 1/n a ee 1 nlo g ( l+tn ) 


ay \—-l/(2n) — ; : 
and hence Jon Nn) Gn) = 0 because the _ series 
dij=1 1/Ylog(] + 9)) ig divergent. 
Therefore, according to Criterion (C9), the d.f. F' of the r.v. Z is 


determined uniquely by its moments. Note, however, that we used 
the Carleman criterion (C7) whereas Criterion (C1), based on the 
existence of the m.g.f. does not help in this case. 

(ii) In case (1) we considered an absolutely continuous r.v. Here we 
take a discrete r.v. and tackle the same questions as raised in case 
(1). 


So, let Z be a r.v. whose set of values is {3, 4, 5,...} and whose 


distribution 1s given by 


P(Z = j] =cexp(—j/logj), 7 =3,4,... 
where c 1S a norming constant obtained from the condition P[Z > 3] 
=], 
Our purpose is to verify whether Criteria (C;) and (C9) apply. 


Firstly we derive suitable upper and lower bounds for the moment 
mj, of order 2n. 


Introducing the function 
h(x) = 2” exp(—2/ log x) 
where x > 3, n > 4, we can easily show that 
h'(a n l l 
@ on 1, 
h(x) z logz  log*a 


= 0 


iff x =x, where 


n i oe 
— (1 — = 
log @y, log Py log Dp 


Since n and x, tend to infinity simultaneously, we have (n log 


1 
Xy,)/X, — 1 whence 2log n <x, < 2n log n for n = no. If we define 
My by 


1 max|s" ‘exp(—z/ log z)| = xf exp(—z,,/ log zn) 
then for 1 > ng we obtain the following estimate: 


Mon = 3 ame I1108F < CMonio > = 
J 


j=s j=3 
< cMons2 < c[(4n +4 4) log(2n_ - 2)]2n+2q—2n—2 


Therefore for all sufficiently large n 


(mon) 1/2”) ra Acl/(2n)[(2n mn 2) log(2n ie a) tte) 


(2) 
< 8(2n + 2) log(2n + 2). 

On the other hand, 

. (man)! ?") > (cMon)'? > e7!(Snlogn) for all large n. 


Now using (2) and (3) we easily find that 


es 
lim [(m2,)'/?” /(2n)] = 00 and > (Man)! (2) — go, 
— n=l 


Therefore the ch.f. of the r.v. Z is not analytic and we cannot 
apply Criterion (C1) to say whether the moment sequence {m;} 
determines uniquely the distribution of Z. However, the Carleman 
criterion (C7) guarantees that the distribution of Z is uniquely 
determined by its moments. 


11.10. The Carleman condition is sufficient but not necessary 
for the determination of the moment problem 


In two different ways we now illustrate that the Carleman 
condition is not necessary for the moment problem to be 
determined. 


(i) Let Fy, be a symmetric distribution on (—co, 0) and Fy a 
distribution on [0, 0). (The subscripts H and S correspond to the 
Hamburger and the Stieltjes cases.) By the relations 


5 g*)|, if 2 > 
tiie) & alt + Ete DI, ce 0 
=|1—Feo(ax*)|, if c<0 


a 


we can define a one-one correspondence between the set of 
symmetric distributions on (—0, 00) and the set of distributions on 
[O, 00). It is clear that fy possesses moments {n} of all orders iff 


F's possesses moments {m,} of all orders. In this case 


To—in, Mongar — 0, n=0,1,2,.... 
Thus we conclude that the Hamburger problem for {Mnf is 
determinate iff the corresponding Stieltjes problem is determinate. 


Moreover the Carleman condition -(M2n)~'/°”) = 00 for the 
determination of — the Hamburger case becomes 


y(n) YP") = 00 in the Stieltjes case. We shall use this result 
later but let us now formulate the following result (see Hey de 
1963b): if a set {m,} of moments corresponds to a determinate 


Stieltjes problem, the solution of which has no point of 
discontinuity at the origin, then the set {m,$ also corresponds to a 


determinate Hamburger problem. 
Consider the r.v._X with density f given by 


Fi= eal exp(—2°), ; ¥ : ; 


where 0 < 8 < 1. One can show that m, = ELX”] =T((n + 1)/B)\/TU/ 
fb), n = 0,1,2,..., so (m,)'/" ~ Kn'/P for some constant K. Then 
Y(t) “P) = 00 for 7 < 6 <1, and by the Carleman criterion 
(for the Stieltjes case) the Stieltjes problem for these moments 1s 


determinate for 2 < 6 < 1. Since the distribution with density f has 
no discontinuity at the origin, from the above result we conclude 
that the Hamburger problem corresponding to the moments m, = 
[(n + 1V/PyTU/p), n = 0,1, 2,... with 2 < B < 1 is also 
determinate. However, it is easy to check _ that 
d(mMan)-“P™) < 00 for 0 < B < 1. Hence the Carleman condition 
is not necessary for the determination of the moment problem (on 


(—00, 00)). 


(ii) Now we shall use the following interesting and intuitively 


unexpected result (see Hey de 1963b): let the moments {l,m 1, m9, 


..} correspond to a determinate Stieltjes problem. After a mass «, 
0 <e<_1, has been added at the origin and the distribution has been 
renormalized, it is possible for the new set of moments {l, m,(1 + 


es) l, mz(1 + ey ta to correspond to an indeterminate Stieltjes 
problem. 

So, let {1, 1, m,...} and {1, m(1+e) !,mo(1 +8) 1,...3,0<e 
< 1, be sets of moments corresponding respectively to a 


determinate and an indeterminate Stieltjes problem. Suppose the 


Carleman condition is necessary for the determination of the 
eee a 
moment problem. Then we should have (m,) VC" = 00, 


, a : —1/(2n) L/(2n) 
which is impossible because Sea eye eee: 


11.11. The Krein condition is sufficient but not necessary for 
the moment problem to be indeterminate 


As mentioned before the Krein condition is sufficient for the 
moment problem to be indeterminate. Let us consider examples 
showing the role of this condition. 


(i) Let_X be ar.v., X ~ N(O, 3), and 0 > 0. Then the density of the 
rv. [X]? is 

: 2 a/s-1 2/5\ sf 3 

ii) = TE exp(—2°/°), if a >0; fs(z) =0, if « <0. 
All moments E[(X1?)*1, Ak = 1,2,..., exist. Berg (1988) has shown 


that the distribution of x)? is determinate for 0 < 4 and 
indeterminate for 0 > 4. Berg did not use the Krein condition. For 
the density fo, O > 4, we find that 


fo {—log f(a”)/(1 + w*)} da < 00, ie the Krein condition is 
satisfied, and this is the easiest way to show that the moment 
problem is indeterminate. 


(ii) Take the function h(x) = exp(—x"), if x > 0 and h(x) = exp(a), 
1 

if x < 0. Here y € (0, 2) and let c, be a constant such that gy(x) = 

Cyh(x), x € Risa probability density. If Y is a r.v. with density g,, 


then all moments E[Y*], k = 1,2,..., exist. Moreover, 


oO > 

i ie a log gy(x)/(1 + x*)}dz = ov. Hence the Krein condition 
is not satisfied but the distribution of Y is indeterminate as follows 
from Example 11.6. 


11.12. An indeterminate moment problem and non-symmetric 
distributions whose odd-order moments all vanish 


In Example 6.5 we described a r.v. Y such that m9,+4] = Epy277 


= 0 for all n = 0,1, 2,... but the distribution of Y is non-symmetric. 
However, we did not discuss the reason for this fact. Let us note 
that the distribution of the r.v. Y in Example 6.5 is not determined 
uniquely by its moments. Now we shall show that the vanishing of 
the odd-order moments of a non-symmetric distribution is closely 
related to indeterminate Stieltjes problems. 

From Example 11.7 we know that there are indeterminate 
Stieltjes problems. Let the d.f.s /y and F’> be two distinct solutions 


of such a problem for a given set of moments {l, my, my,...}. 
Then 


F(x) = 4Fi(xz) + $[1-Fo(-z-0)], ceR 
is a d.f. which evidently is non-symmetric. Moreover, F’ has the 
following moments: 1, 0, m9, 0, mg, 0, ... . Therefore any odd- 
order moment of Fis zero despite its non-symmetry. 
Finally, let us present one additional example based on the 
lognormal distribution considered in Example 11.2. Once again, let 


f(a) = (2n)~-/?a~t exp[—$ (log x)?], fi(x) = f(x)[1—sin(2mlogz)], 2 >0. 


Denote by Z ar.v. whose density g is defined as follows: 


l ¢(4 e 
pe. |) Ses if oot 
= + fi(—2), ic ee 


Then one can check that all the moments of Z are finite, all odd- 


gant ] 1 


order moments E| are zero but Z is non-symmetric. 


11.13. A non-symmetric distribution with vanishing odd- 
order moments can coincide with the normal 
distribution only partially 


Let us recall that in general no probability distribution 1s 
determined by a finite number of moments. The previous examples 
show that the distribution cannot be determined uniquely even if 
we know all (and hence an infinite number of) its moments. 
However, if we specify the class of distributions, then a member of 
this class could be determined by a finite number of moments. For 
example, a member of the so-called class of Pearson distributions 
is specified by a knowledge of at most four moments (Feller 1971; 
Hey de 1975). Certainly we have to indicate the normal 


distribution N(a, o*) which is determined uniquely by its first two 
moments only. Thus we come to the following question: does there 
exist a r.v. X such that for infinitely many &k, but not for all 4 = 1, 


we have ELX*] = E[Z*] where Z ~ N(a, o*) but nevertheless 


d 
Ae a 

Let Z be a r.v. distributed N(0,1). We shall construct a r.v. X 
such that 


(1) E[X7*+1) = E/77*+1) k=0,1,2,..., 


) a ; | d 
E[X°*]=E[Z?], E[X*])=E/[Z*] but X F Z. 


If (1) holds we can speak about a partial, but not full, 


coincidence of the distributions of X and Z. 
So, let Y7 be ar.v. with density 


g(x) = 4c* exp(—cl2|'/*)[1 — esigne sin(clz|'/*)], 2 ER’ 


where c > 0, ¢ # 0, |e| < 1. Obviously g is non-symmetric. The 
moments ”« = E[Y;"] can be calculated explicitly, namely 


Mo~41 =0, Moz = ac 8" (8k yl, Oho reins 
(see also Example 6.5). By choosing c = (111/6)1/ we get 


me Eye | =. 
Take now ar.v. Yo which is independent of Y; and takes the 


values 1 and — | with probability 7 each. For some constant /, 0 < 
{6 < 1 which will be specified later, put 


Y= 1p) Vee. 
Clearly the distribution of _X 1s non-normal and non-symmetric, 
E[X**t") =0=E[Z"""'] foreachk =0,1,2,..., E[X7) =1=E[Z’]. 
Finally we find 
E[X*] = 67 + 66(1 — 8) + (1 — 6)*E[Y,’). 


It remains to choose f/f such that EX ]=3= E[Z7]. Indeed, if the 
kurtosis coefficient of the r.v. Yj 1s yo = E[Y;'] — 3, then take 


) /(y2 — 2) Since c was already fixed (c= 
(11!/ 16)!/ 8) then yz and hence f/ have definite values. 


Thus we have constructed a r.v. X which coincides partially but 
not fully with the standard normal r.v. Z in the sense of (1). 


SECTION 12. CHARACTERIZATION PROPERTIES OF 
SOME PROBABILITY DISTRIBUTIONS 


There are probability distributions which can be characterized 
uniquely by some properties. In such cases it is natural to use the 
term “characterization properties’. 

Let us formulate two important results connected with the most 
popular distributions, the normal distribution and the Poisson 
distribution. 


Cramer theorem. /f the sum X1 + X of the r.v.s X1 and X is 


normally distributed and these variables are independent, then 
each of X1 and X> is normally distributed (Cramer 1936). 


Raikov theorem. /f X1 and X> are non-negative integer-valued 
r.v.s such that X} + X> has a Poisson distribution and X1 and Xp 
are independent, then each of X1 and X37 has a Poisson distribution 
(Raikov 1938). 

These important theorems, several useful corollaries and other 
characterization theorems can be found in the books by Fisz 
(1963), Moran (1968), Feller (1971), Kagan et al (1973), Chow 
and Teicher (1978) and Galambos and Kotz (1978). 

Let us note that some of the examples dealt with in Section 10 
can be compared with the Cramér theorem. In particular this 
comparison shows that the assumption of the independence of X] 
and_X’ 1s essential. 

We present below various examples of discrete and absolutely 
continuous distributions and clarify whether or not some properties 
are characterization properties. 


12.1. A binomial sum of non-binomial random variables 

Let the r.v.s. X and Y be non-negative integer-valued and let their 
sum Z =X + Y have a binomial distribution with parameters (n, p), 
Z ~ Bi(n, p). Then the probability generating function of Z 1s E[s7] 
= (pstq)". If additionally we suppose that X and Y are 


independent, then (ps + q)” = E[s*]E[s* ]. Since all factors of the 
polynomial (ps + g)” have the form (ps + q)*, k = 0,1,..., n, it 
follows that each of the variables X and Y is also binomuially 
distributed. This observation leads to the following question: does 
this conclusion hold without the hypothesis of independence 
between_X and Y? Let us show that the answer is negative. 

Let ¢ be any non-negative integer-valued r.v. Suppose ¢ takes 
more than two different values. Define the r.v.s ¢ and 7 by 


é=[1¢], n= (+0) 


where [x] denotes the ‘integer part’ of x. Obviously 


C=. sy 

Moreover, knowing the distribution of ¢ we can easily compute P[¢ 
=k] and P[7 = m|] for all possible values of k, m. Since P[¢ = k,n = 
m] = 0 for those k, m satisfying the relation |k — m| > 1, we see that 
the r.v.s ¢ and 7 are not independent. Note that this property holds 
irrespective of the distribution of ¢. In particular, suppose ¢ 1s 
binomially distributed. Then neither ¢ nor 7 1s binomial, but their 
sum ¢ + 7, which 1s equal to ¢, has a binomial distribution. Recall 
that C and 7 are dependent. 


12.2. A property of the geometric distribution which is not its 
characterization property 
Recall that the r.v..X has a geometric distribution with parameter p, 
O<p<1,if PLX¥=n]=pq",q=1-p,n=0,l..... 
Let X1 and X> be independent r.v.s each distributed as_X. From 
the definition of a conditional probability we can easily derive that 


(1) PX, =k|X,+ Xo =n] = R= Dy lL aroans 


es ll 


That is, X1|X] + Xz =n is discrete uniform on {0,1,..., 7}. 

We are interested now in whether (1) is a characterization 
property of the geometric distribution. More precisely: suppose _X1, 
X are integer-valued independent r.v.s which satisfy relation (1), 
does it follow that _X1, Xo are geometrically distributed? 

To find the answer let us consider the set Q = {w,;, : 4 =0,1,..., 
n,n = 1, 2,...$ and let p,, n = 0,1,... be positive numbers with 
>in=0 Pn = 1. Define a probability P on Q as follows: 


Piwex) = Da f(t : ie 
This means that £2 = Un=0 On where Qn = 4 Oye = 0, \oiea Mt, 
P(Q,) = py and each of the outcomes wz, has probability py/(n + 
1). Introduce two r.v.s, Yj and Y, such that Y) {@z,) =k, Yo{@xn) 
=n-—k. Then fork =0,1,..., 7, 


PY, =k. ¥ Y=n 
PY, =k|¥i+ Yo=n] = PI¥i =k, ¥i + Yo =n] 


P\Y, + Yo =n! 
— Plwen)  pnf/(nt+1) 1 
- P(Q, ) - Pn ntl 


Thus relation (1) is true for the r.v.s Y; and Y>. However, the 
distribution of Yj is 
PY =F Play he See kewl | = et 
Boe Ue 1 
and since the p, are arbitrary (with 2p, = 1), P[Y, =<], « = 0,1,... 
can be very different from the geometric distribution. 


Therefore (1) 1s not a characterization property of the geometric 
distribution. If additionally we suppose that xX, and XX are 


independent and identically distributed, it can be proved that each 
of these variables has a geometric distribution. 


12.3. If the random variables_X, Y and their sum_X + Y each 
have a Poisson distribution, this does not imply that_X 
and Y are independent 


(i) Let X and Y be independent r.v.s each with a Poisson 
distribution. Then their sum _X + Y also has a Poisson distribution. 
We want to know whether the converse of the above statement is 
true: 1f X and Y are integer-valued and each of X, Y and X + Y has a 
Poisson distribution, are the variables XY and Y independent? It 
turns out that the answer to this question 1s negative. 

Take two r.v.s, ¢ and 4 each with a Poisson distribution of a 
given rate. Denote their individual distributions by {g;,i = 0,1,...} 


and {77 = 0,1,...5 where qj = P[¢ = 7] and 7; = P[y =]. Introduce 
the sets At = {(0,1), C1, 2), (2,0)$ and Ay = {(0, 2), (2,1), C1, 
0)}.The joint distribution of ¢ and y, py ‘= P[g = in = J] will be 
defined in the following way: 


Gil; ST oy if (7,7) = Ay 
(1) Pig = \ UTZ — E; if (2,7) € No 
Gita; otherwise. 


Here ¢ is a real number such that |€] < mimi gir;. (2,7) € Ai U Ag. 


It is easy to check that Wiel = 0,1,..., 7 = 0,1,...} 1s a two- 
dimensional discrete probability distribution. Moreover, using (1) 
we find that the sum ¢ + 7 has a Poisson distribution. By definition 
¢ and 7 also have Poisson distributions. However, (1) implies that 
the r.v.s € and 7 are not independent. 


(ii) Here 1s a case slightly similar to (1). For fixed A > 0 let ¢ be an 


(0, =Ate~?4) 


arbitrary number in the interval ‘ Defin8e Pi, t= 0,1, 


..../ =9,1,..., as follows: 


2} pa =A — €, 


p33 = Ae A +e, pi = are for all other 7 and j, 


_. FI JS2% ~.. — 144 
Pil = A*e +e, pig= eae 


Direct calculations lead to the following conclusions: 
1) (Pi, 1 7 = 9,1,...5 1s a two-dimensional discrete distribution of a 
random vector, say (X, Y) ; 
2) X ~ Po(r) and Y ~ Po(A); 
3) X+Y ~ Po(2X). 
However the two components_X and Y are not independent. 


12.4. The Raikov theorem does not hold without the 
independence condition 


Recall that the independence of the variables _X and Y is one of the 
hypotheses in the Raikov theorem (see the introductory notes). We 
are now interested in what happens if we do not assume that_X and 
Y are independent. Our reasoning is similar to that used in Example 
12.1), 

Let C be ar.v. with a Poisson distribution. Define the r.v.s é and 


n by 


PS la6le T= ler) 
(here [x] denotes the integer part of x). It is easy to verify that each 
of ¢ and 7 1s an integer-valued r.v., neither ¢ nor 7 has a Poisson 
distribution, ¢ and 7 are not independent, but the sum ¢ + 7 = ¢ has 
a Poisson distribution. 
Therefore, as expected, the independence condition in the Raikov 
theorem cannot be dropped. 


12.5. The Raikov theorem does not hold for a generalized 
Poisson distribution of order k, k > 2 


We say that the integer-valued non-negative r.v. X has a 
generalized Poisson distribution of order k and parameter A, i > 0, 


if 


Mi + +3 ke eka 


] PA S| = So 
wy) : de Pi bwwre Jig! 


where the summation is taken over all non-negative integers /,..., 
jj, such that 77 + 2j2 +° + Ajp=n. If k = 1, then (1) defines the 
usual Poisson distribution. By using (1) we find explicitly the p.g.f. 
2(s) = E[s*] (see Philippou 1983): 


| k: 
(2) g(s) = exp -» (: ~ ie “) , |s| <1. 


Suppose now that Yj, Y> are independent r.v.s taking values in 
the set {0,1, 2 ...} and such that the sum Y; + Y> has a generalized 


Poisson distribution of order &. The question is: does it follow from 
this that each of the variables Y}, Y> has a generalized Poisson 


distribution of order k? 

Note that in the particular case k = 1 the usual Poisson 
distribution is obtained and it follows from the Raikov theorem 
that the answer to this question 1s positive (see Example 12.4). We 
have to find an answer for k > 2. 

Consider two independent r.v.s, Z,; and Z) where Z; has a 


generalized Poisson distribution of order (kK — 1) and a parameter A, 
and Z> has the following distribution: 


. \I/Ke—A 
Pe = N= GTR 


We shall use the explicit form of the p.g.f.s g1(s) and g9(s) of Z] 


= Oe 2h Rive, « 


and Z> respectively. Taking (2) into account we find that 


k-1 
ni =e|-a(e-1-S0)]. dct 


t= 


On the other hand, direct computation shows that 


g(8) = exp[-A(1— s*)], |s| <1. 
Since Z; and Z> are independent, the p.g.f. g3 of the sum Z] + Z5 
is the product of g; and g9. Thus 


| k 
g3(s) = exp - (i = “| ,  |s| <1. 


i=l 
But, looking at (2), we see that g3 1s the p.g.f. of a generalized 
Poisson distribution of order 4. Therefore the r.v. Z = Z, + Z> has 


just this distribution. Moreover, Z 1s decomposed into a sum of two 
independent r.v.s Z, and Z , neither of which has a generalized 


Poisson distribution of order k. The Raikov theorem is therefore 
not valid for generalized Poisson distributions of order k > 2. 


12.6. A case when the Cramér theorem is not applicable 
Recall first that the Cramer theorem can be reformulated in the 


following equivalent form. Let /’j(x), x € R! and F (x), x € R! be 
non-degenerate d.f.s satisfying the relation 


(1) (F, * Fo)(x) =, (x) forall a € R' 


where ®q « 1s a d.f. corresponding to N(a, 0). Then each of Fy 
and F’) is a normal d.f. 
Suppose now that the condition (1) 1s satisfied only for x < x9, 


where xg is a fixed number, xg < © (i.e. not for all x € R!). Is it true 
in this case that /y and /’ are normal d.f.s? The answer follows 


from the next example. 
Denote by ® = © the standard normal d.f. and define the 


function: 


Mls) = 2s (-1) P(2—n), if «<0 


n=O 


v(x), if « >0 
where v(x) 1s an arbitrary non-decreasing function defined for x € 
(O, 00) and such that v(0+) = F'1(0) and woo) = 1. 
It is easy to check that F’) is a d.f. and let ¢] be ar.v. with this 
d.f. Further, let Fy be the d.f. of the r.v. ¢ taking two values, 0 and 


1, each with probability 2. Then we find that 


(F, * F))(x) = $[Fi(x) + F(a — 1)] = ®(x) forall « <0 
1.e. condition (1) is satisfied for x < xg with x9 = 0. However if x > 


QO, then (Fy * Fo)(x) # M(x). Obviously Fy and F> are not normal 


d.f.s. 
Hence condition (1) in the Cramer theorem cannot be relaxed. 


12.7. A pair of unfair dice may behave like a pair of fair dice 


Recall first that a standard and symmetric die (a fair die) is a term 
used for a ‘real’ material cube whose six faces are numbered by 
1,2, 3, 4, 5, © and such that when rolling this die each of the 


outcomes has probability é. 

Suppose now we have at our disposal four dice: white, black, 
blue and red. The available information is that the white and the 
black dice are standard and symmetric. 

Then the sum X + Y of the numbers of these two dice is a rv. 
which is easy to describe. Clearly, X + Y takes the values 2, 3, 4, 5, 
6, 7, 8 9, 10, 11 and 12. with probabilities 


t = om oe oC Loy nay es om 2 


mia we additionally aoe that the blue and the ea dice are 
such that the sum ¢ + 7 of the numbers on these two dice 1s exactly 
as the sum X + Y obtained when rolling the white and the black 
dice (1.e. ¢ + 7 takes the same values as X + Y with the same 
probabilities shown above). Does this information imply that the 
blue die and the red die are fair, 1.e. that each is standard and 
symmetric? 

It turns out the answer to this question is negative as can be seen 
by the following physically realizable situation. Take a pair of 
ordinary dice changing, however, the numbers on the faces. 
Namely, the faces of the blue die are numbered 1, 2, 2, 3, 3 and 4 
while those of the red die are numbered 1, 3,4, 5, 6 and 8. If € and 
yn are the numbers 
appearing after rolling these two dice, we easily find that indeed 
aa 2 se 

Hence, despite the facts that the sum XY + Y comes from a pair of 
fair dice and that _X + Y has the same distribution as ¢ + 7, this does 
not imply that the blue and the red dice are fair. 

The practical advice is: do not rush to pay the same for a pair of 
dice with fair sums as for a pair of fair dice! 


12.8. On two properties of the normal distribution which are 
not characterizing properties 


Let X and Y be independent r.v.s distributed N(O,1). Then the ratio 
X/Y has a distribution 


] 
PAY sz -a I} exp (—5 r) exp Gu ) ary, ze€R’. 


It 1s easy to check that 


1 
(P[X/Y < z])! zE€R’. 


2 (1 + 22)’ 
Hence X/Y has a Cauchy distribution. Let us call this property 
(N/N-C). 


The presence of the property (N/N-—>C) leads to the following 
question. Suppose_X and Y are 1.1.d. r.v.s with zero means such that 
X/Y has a Cauchy distribution. Is it true that_XY and Y are normally 
distributed? By examples we show that the answer to this question 
is negative. 

(i) Consider two 1.1.d. r.v.s ¢ and 7 having the density 


/2 1 


re ¢ ER. 
mr1l+z 


f(a) = 


If g(z), z € IR! denotes the density of the ratio ¢/y then we easily 
find 


2 | _— | 
g(z) = = ff (1+ ey + y*)~'da dy 
a x /y<z 


y OO f 1 
. L+y*) "4+ 24y*)"|y| dy) =——... 
(= i ye) ( y)“\yl dy meer, 


Therefore the ratio ¢/y has a Cauchy distribution but obviously 
the variables ¢ and y are non-normally distributed. 


Thus we have established that the property (N/N-—C) is not a 
characterization property of the normal distribution. 


(ii) Let us consider another case on the same topic. It can be 
checked that 


ole | 
(2) = ———.———.., rER 
Al®)=laeaeey 7 


is a density function. Take two independent r.v.s X] and Yj each 


with density fy. Then the ratio Z;] = Xj/Y, has a Cauchy 
distribution (for details see Steck 1959). Clearly X7 and Y] are not 
normally distributed. 


(iii) The variables XY and Y above are independent and this 
condition 1s essential for_X/Y to be Cauchy distributed. It turns out 
the ratio _X/Y has a Cauchy distribution for some cases when_X and 
Y are dependent. (We can guess that now the reasoning is more 
complicated.) Indeed, consider the function 


w(t, u) = (1 — 2iu t+ ee a ae t,ueR’. 


It can be shown that y is the ch.f. of a two-dimensional random 
vector, say (¢, 7) such that: 1) each of ¢ and y 1s not normally 
distributed (look at the marginal ch.f.s of ¢ and 77); 2) ¢ and 7 are 
dependent; 3) the ratio ¢/y has Cauchy distribution. (For details see 
Rao 1984.) 


(iv) Let Xy,..., X, be n independent r.v.s each distributed normally 
N(O, 0*), n= 2. Define 


le e i, ge =» oe Qe 
A=- ee —— XxX; =A. aS . 
a te aan 

It is well known (see Feller 1971) that T has a Student distribution 
with n — | degrees of freedom. (Recall that T is often used in 
mathematical statistics.) 


Let us consider the converse. Suppose X1,..., X;, are 1.1.d. r.v.s 


with density f(x), x € RI, and we are given that the variable T has 
a Student distribution. Does it follow from this that f 1s a normal 
density? The example below shows that for n = 2 the answer 1s 
negative. 

Let X1, Xo be independent r.v.s each with density f and let 


ASF Xe), SS HAS. FH yok 7s. 


Our assumption is that 7 has a Student distribution. Thus the 
problem 1s to find the unknown density f. 
Let us introduce the functions /1(x1, x2), h(x, y) and h3(z, y) 


which are the densities of the random vectors (X1, X>), (X, s) and 
(T, s) respectively. We find 


hi (ray ta) = fai) F lea), v1 € R’, r ER’, 
ho(a,y) = 2°/? f(a + y/V2)f (2 - y/V2), reR', yeER’, 
hs(2.y) = 2f (y(z + 1)/v2) f (yl2-)/v2), eR, yer. 
By the assumption above 7 has a Student distribution and clearly 
in this case (of two variables X, and X )7 has a Cauchy 
distribution, that is the density of 71s I/[x(1 + z)], z €R!. But the 
density of 7 can be obtained from /3(z, y) by integration. Thus we 
come to the following relation: 


(1) on [ f (e+ 0/v2) F (ye - /v2) ay 


1+ 27 


It can be shown (see Mauldon 1956) that f 1s an even function 
and that the general solution of (1) is of the form f(x) = qt! 2 9(x7), 
xe Ri where 


u)g(au) du = —, a> QO. 
J a(u)g(au) du = 


Furthermore, the integral equation (2) has an infinitely many 
solutions. However, for our purpose it 1s enough to take e.g. only 
two solutions, namely: 


oy 1 _ ant 1 
(a) giuj=e"* => f(z)=—me”, «tER 
(w @)=— 
TI — ) T % , = — : i 
(b) gu) = Y2/rl+u") => f(x) @ Lee reER 


Obviously in case (a) the variables X7 and X> are distributed 


normally N(0, 2), while in case (b) the distributions of X1 and X9 


are both non-normal. Therefore we have constructed two 1.1.d. r.v.s 
X, and X> whose distribution is non-normal but the variable T has 


a Student distribution. 
Finally it 1s interesting to note that the same probability density 


function, namely f(x) = (V2/m)/(1+ 2%), has appeared in both 
cases (1) and (11). 

(v) Recall the definition of the so-called beta distribution of the 
second kind denoted by Ba, b). We say that the rv. X ~ BC)(a, 
b) if its density equals x*!(7 + x)~* 9/B(a, b),if x > 0 and 0, if x 
<0. Herea>0,b>0. 

The following result being of independent interest (see Letac 
1995) is used for the reasoning below: Z has a Cauchy distribution 
c(0,1) — = Z is symmetric and Z| ~ p23, 3). 

Consider now two independent and symmetric r.v.s Xj] and X9 
such that 


Xy|? ~ 8) (5,b) and |X2|? ~ 8 (5,5+6) forsome b> 0. 
Then, referring again to the book by Letac (1995) for details, we 
can show that the quotient X3/X 7 has a Cauchy distribution €(0,1). 

Hence we have described another case of independent r.v.s X] 
and X> such that their quotient X5/X, follows a Cauchy 
distribution. Obviously, neither Xj nor _X> is normally distributed. 
Note, however, that here we did not make the advance requirement 


that _X1 and_X> have the same distribution. 


12.9. Another interesting property which does not 
characterize the normal distribution 


Let us start with the following result (for the proof see Baringhaus 
et al 1988). If X and Y are independent r.v.s, Z = XY(X? i y?yl/2 
and X ~ N(0,o7), Y ~N(0,03), then Z ~ N(O, o*) with 
o* = a%0% /(0, + 02)". 

It is interesting to poit out that Z 1s a non-linear function of two 
normally distributed r.v.s and, as stated, Z itself has also a normal 
distribution. This leads to the inverse question: if X and Y are 
independent r.v.s and Z is normally distributed, does it follow that 
X~ Nand Y~N 

Assume the answer is positive. Thus we can suppose e.g. that Z 
~ N(0,1). The definition of Z implies that 


1/X?2+41/¥? £1/z?. 
It is easy to find that the distribution of 1Z? has a Laplace 


transform y(t) = exp(—V2t), ¢> 0, meaning that 1/Z* has a stable 


iL , 
distribution with parameter 2. Let us show that 1/Z* admits the 
representation 


V7 = U. +U, 
where U, and U> are independent non-negative r.v.s such that the 
distribution of each of them does not have an ‘atom’ at 0 and does 


1 
not belong to the class of stable distributions with parameter 2. For 


this we write y in the form 


] er = 
w(t) = exp |-— | (1 — e )\n-*/4 de 0 
V2r Jo 


and introduce the following two functions of x € Ri: 


oo Tene) 


ie) = 


l -1/2 
ho(x) = hi(a) + 
Rt +O, (a),  he(a) = hi(z) 


(as usual /4(-) is the indicator function of the set 4). Denoting 


27 


ei(t) = | gy idtte, 528% 
0 


[a-"h;(x) da, ; 
we see that the integrals /1 IMTS 7 = 1, 2, are convergent 


and both 


wi(t) =exp[—yil(t)], t€.R° and qo(t) = exp[—yo(t)], te R’ 
are Laplace transforms of an infinitely divisible distribution with 
support [0, 00) (see Feller 1971). Since y(t) > © as t > 0, j = I, 


2, these distributions do not have ‘atoms’ at 0. 
Suppose now that U; is ar.v. having Wj as its Laplace transform, 


j = 1,2. We can take U; and U> to be independent. Then the 
Laplace transform of the distribution of Uj + U> equals w1(t)wo(?). 
However wj(t)yo(t) = w(t). This fact and the reasoning above 


imply that I/Z? is the sum of U j and U> which are independent but, 
obviously, the distributions of //U, and //U> are not normal as 


they might be if the answer to the above question were positive. 
The interesting property described at the beginning 1s therefore 
not a characterizing property for a normal distribution. 


12.10. Can we weaken some conditions under which two 
distribution functions coincide? 


Let us formulate the following result (see Riedel 1975). Suppose 
F(x), x € R!, is an infinitely divisible d.f. and F(x) = O(x), x € R 


L where © is the standard normal d.f. Then the condition 


(1) lim F,(2)/Fo(x2) = J 
C—O 


implies that 
(2) F, = Fh. 


It 1s interesting to show the importance of the conditions in this 
result. In particular, the following question is discussed by Blank 
(1981): does (1) imply that F'] = F> if we suppose that Fy and F 
are arbitrary infinitely divisible d.f.s, /o(x) > 0, x € R? By an 
example we show that the answer is negative. 

Introduce the functions 


lept wss. 4 1 
Gy (2 |) i= ar: ab G'o(x) — a kl 


and define Ff’) and > as convolutions by 
Fi(x) =(®*G,)(x), Fo(x) = (®*Go)(x), x ER’. 


Then both F] and F> are infinitely divisible and F(x) > 0, x € RI. 
Let us now estimate the quantity [/’j(x)/F’(x)| — 1 mn two ways. 
We have 


Fit), _ So(l/MO-H) SE U/A Be — 8) 
F3(a) oa o(1/k!)®(x — 2k) — aia (1/k!) O(a — 2k) 


1 _y(1/k)®(a@ —k) . O(x -1 : | 
< dsm (1/ hi) P(e — f) < e ) S320 (1/4 +0 asz— —oo. 


7 (x) ~ (a) 
On the other hand, 
Fi(x) . &(x i 
1- F(a) < Say Lin. {k!) +0 asx —- —oo. 


Thus limy_,.o[/’1(x)/F'(x)] = 1, that is relation (1) 1s satisfied, but 


ies er 


12.11. Does the renewal equation determine uniquely the 
probability density? 
Let us start with a sequence (Xj, i = 1, 2,...} of non-negative 1.1.d. 


r.v.s with a common d.f. F and density f. It is accepted to interpret 
X; as a lifetime, or renewal time and it 1s important to know the 


probability distribution, say H;, of the variable N; defined as the 


number of renewals on the time interval [0, ¢]. In some cases it 1s 
even more important to find U(t) = EN,, the average number of 


renewals up to time ¢ without asking explicitly for H;. We have 


: . t 
(1) U(t) = Ft) + | F(t — s) dU(s) 


and hence the function u(t) = dU(t)/dt (which exists since f = F" 
exists), called a renewal density, satisfies 


(2) u(t) = f(t) + [ f(t—s)u(s)ds fort > 0. 


The term renewal equation is used for both (1) and (2) and we 
are interested in how to find U (or uw) in terms of F (or f), and 
conversely. If f* and u™* are the Laplace transforms of f and 


OY (f*(a) = i eat (it) dt. u*(a) = fe e~ tu (t) dt), we easily 
find from (2) that 


(3) u"(a) = ——, a2 0. 


Obviously, (3) and (2) imply that f determines uw uniquely. 
Consequently F determines U uniquely. E.g. if’ ~ Exp(A), then 


f*(a) =A/(A + a), u*(a) = Na => u(t) =A for all t= U(t) = Xt, a well 
known result. 

Let us now answer the inverse question: does u determine f 
uniquely? 

Recall first that the classical renewal theorem (Feller 1971) 
states that in general lim;_,.5 u(t) = 1/u, where uw = E[LX7]| is the 
average lifetime. 

Let us show that there is a renewal density u(t) with u(t) > 1/u 
as t — oo for same x > 0 and such that f*(a) = u*(a)/[1 + u*(a)] 
found from (3) leads to a function f(t) which may not be a 
probability density. 

Indeed, take u(t) = (1 — e “)/u for t > 0 and fixed yw. Obviously, 
u(t) > 1/u as t > ., The Laplace transform u*(a) of this u(f) 1s 

a ae 1 x rT ae 1 
te) a(a + jL) fe) = a? +au+1 

Suppose now that uw < 2 (by assumption uw > 0). Inverting f* we 
find that f() = e #/2(c/2)1sin(ct/2), 0 < t < ©, where 

a 7 Cx op fy Do ae 
c = V4—-* and that Jo f(t) dé = 1. However the function f is 
not a probability density. 


12.12. A property not characterizing the Cauchy distribution 


Suppose the r.v. XY has a Cauchy distribution with density f(x) = 
W/{ad + x’)], x € R!. Then it is easy to check that the r.v. 1/X has 
the same density. This property leads naturally to the following 
question. Let _X be a r.v. which is absolutely continuous and let its 
d.f. be denoted by F(x), x € RI. Suppose the r.v. 1/X has the same 
d.f. F. Does it follow that Fis the Cauchy distribution? 

Clearly, if the answer to this question is positive, then the 

ie fh 

property X = 1/X. would be a characterizing property of the 
Cauchy distribution. It turns out, however, that in general the 


answer 1s negative. Let us illustrate this by the following example. 
Suppose_X is ar.v. with density 


of |e) A 


1 
g(x) es 1 Ea if Ra > I. 


Thus_X 1s absolutely continuous and it is easy to check that 1/X has 


,d 
the same density g. Hence X = 1/X. However, that X does not 
enjoy the Cauchy distribution. 


12.13. A property not characterizing the gamma distribution 


Let X and Y be independent r.v.s each with a gamma distribution 
Y(p, a2), p > 0, a > O; that is, the common density is 


Dj. i ae 
(1) f(z) = ' OT Plea iL ae, 
l(p) 


Then the ratio Z =.X/Y has the following density: 


0, ie 
(2) g(z) = | op (l+2)"*", if z>0 


(beta distribution of the second kind; see Example 12.8). 

This connection between gamma and beta distributions leads to 
the next question. Let _X and Y be positive independent r.v.s such 
that the ratio Z = _X/Y has a density given by (2). Does this imply 
that each of X and Y has gamma distribution? 

Let us show that the answer to this question is negative. To see 
this, introduce the following two functions, where a > 0, p > 0: 


a 0, nt 2 
fila) = cx P-1e-4/*, if x > 0, 
f(a) = 0, _ if z<0 
fo(z) = cor? /[(1 aes g*\ptly 2), if xr > 0. 


It is easy to check that with 


c,=a?/T(p) and co = 20 (p+ $)/[P($p)l (Sp $)| 
f; and f> are density functions. Take two independent r.v.s, say ¢] 
and 71, each with density /;. Then we can establish that the density 
g1 of the ratio ¢) = €)/n] coincides with the density g given by (2). 
Clearly, f; does not have the form (1). 


The same conclusion can be derived if we start with two 
independent r.v.s, ¢> and 747, each having the density fo. In this 


case again the density of ¢9 = €9/n7 coincides with (2) while /f> 1s 
not of the form (1). 


12.14. An interesting property which does not characterize 
uniquely the inverse Gaussian distribution 


We say that the r.v. X has an inverse Gaussian distribution with 
parameters u > 0 and A > 0 if the density f of X is given by 


mA A (z-p)*)_ | 
(1) f(x) = i (<5 Exp ss aml | ee 0 
0, 


if x < 0. 


It is easy to see that all moments m, = ELX”], n = 1,2,..., exist. 


Moreover, X has an analytic ch.f., hence this distribution 1s 
determined uniquely by its moment sequence {m,, n = 1,2,...}. 


It is interesting to note that all negative-order moments of X are 


also finite, that is ELY ”] exists for each positive integer n. Further, 


a standard transformation leads to the following interesting 
relation: 


(2) E[X—") = E[X°T!) /(EX)7"t7, xn =1,2,.... 


This relation and the uniqueness of the moment problem 
mentioned above motivate the conjecture: if XY is a positive r.v. 


such that all moments E[X™] and ELY ”], n = 1,2,..., exist and 
satisfy (2), then_X has an inverse Gaussian distribution. It turns out, 
however, that this conjecture is not correct. 

Note firstly that EX = uw and let for simplicity uw = 1. Then (2) has 


the form E[X”] = ELX”*!]. Further, the density (1) satisfies the 
relation 


(3) Pie = } @ , aed. 


Thus the density f of the inverse Gaussian distribution can be 
considered as a solution of the functional equation (3). 

Let Y be ar.v. whose density g satisfies (3). Then it is easy to 
check that the relation (2) is fulfilled for Y. So it is clear that 1f (3) 
has a unique solution, namely f by (1), then our conjecture will be 
true; otherwise it will be false. To clarify this consider the function 
g given by 


a ] 
| — —~ ——., if r >0 
0, if «<0. 


It can be verified directly that g is a probability density function 
which satisfies (3). As a consequence, Y satisfies (2). 

Therefore we have found two r.v.s, X and Y, whose densities (1) 
and (4) are different, and nevertheless both satisfy relation (2). 
Thus the relation (2) 1s not a characterizing property of the inverse 


Gaussian distribution. 

Finally we suggest that the reader considers equation (3) and 
tries to find other solutions to it which will provide new r.v.s 
satisfying relation (2). 


SECTION 13. DIVERSE PROPERTIES OF RANDOM 
VARIABLES 


In this section we consider examples devoted to different 
properties of r.v.s and their numerical characteristics. Some 
notions are defined in the examples themselves. 


13.1. On the symmetry property of the sum or the difference 
of two symmetric random variables 


Recall first that the r.v. X is called symmetric about O if 
d rs 

X = (—X). In terms of the d.f. F, the density f and the ch.f. @ this 

property is expressed as follows: F(—x) = 1 — F(x) for all x = 0; 

fi—x) = f(x) for all x € Ri. g(t), t € R! takes only real values. By the 


examples below we analyse the symmetry and the independence 
properties under summation or subtraction. 


(i) If_X and Y are identically distributed and independent r.v.s, then 
their difference X — Y is symmetric about 0. Suppose we know that 
X £Y and that the difference Y— Y is symmetric. Does it follow 
that X and Y are independent? To see this consider the random 
vector (X, Y) defined as follows: 


It is easy to check that _X and Y have the same distribution, so 


if 
taking the values 1, 2 and 3 with probability equal to 6° 6 and 5 
respectively. Obviously: X and y are not independent. Further, the 
difference ee a Fay ee the values —2, —-1, 0, 1,2 with 


ay os 


distribution. i ne a Z = X — Y is asymmetric r.v. despite 
the fact that the variables_X and Y are not independent. 


(ii) If.X and y are symmetric and independent r.v.s, then the sum Z 
= X + Y is again symmetric. Thus it 1s of interest to discuss the 
following question. Suppose _X and y are independent r.v.s and we 
know that X is symmetric and that the sum Z = X + Y 1s also 
symmetric. Is it true that Y is symmetric? Intuitively we could 
expect a positive answer. It turns out, however, in general the 
answer 1s negative. This 1s illustrated by the following example. 

Let ¢ be a r.v. with the following ch.f. indicating that ¢ 1s 
symmetric: 


(1) we(t) = = a - 2|t], uf it = 


if |t) > 


bo [Rho | 
. 


Consider two other ch.f. s : 


1—|t|, if |t|< 4 1—|t|, if |t}<1 
hat) aes if |t| > T va(t) (), if |t| > 1. 


Introduce now a r.v. 7 with ch.f. yw, which 1s the mixture of hy 


and h: 


W(t) = de"hi(t) + de“ ha(t), te R’. 


Elementary transformations show that 


, 1 tay — J (1 — le) cost, if |t| < 


| | 


where e(¢) = 1, if |t] < 1 and e(t) = 0, 1f |¢| > 1. 

The explicit form (2) of the ch.f. w, shows that the r.v. 7 1s not 
symmetric. 

Thus we have described two r.v.s, ¢ and y, the first being 
symmetric while the second is not. Assuming that ¢ and 7 are 
independent we look for the properties of the sum ¢ = ¢ + 7. Since 
for the ch.f.s wz, wy, and we we have we = yey, in view of (1) and 
(2), it is not difficult to find that 


HO = (1 — 2|t|)(1 —|é|) cost, if |t| < 5 

él 0, if |t| > 4. 
Obviously ye takes only real values which means that the r.v. ¢ 1s 
symmetric. 

Therefore the symmetric property of two variables, ¢ and ¢= ¢ + 
yn, together with the independence of ¢ and 7, do not imply that 7 1s 
symmetric. 

Here 1s another equivalent interpretation—the difference, and 
hence the sum, of two dependent r.v.s both symmetric, need not be 
symmetric. 


13.2. When is a mixture of normal distributions infinitely 
divisible? 


Let G(u), u € R® be a df. Then the function y(t), t € IR! where 


(1) w(t) = / exp(—4t*u) dG(u) 


0) 


is a ch.f. The d.f. F’ with ch.f. w is called a mixture of normal 
distributions and G a mixing distribution. Note that the density f of 
F corresponding to (1) has the form (see Kelker 1971): 


f(Z) = (27u)—1/? exp(—ax?/(2u)) dG(u). 
J0 

Since the normal distribution 1s infinitely divisible it is natural to 
ask whether such a mixture preserves the infinite divisibility. It 1s 
easy to check that if G is an infinitely divisible d.f. then yw is an 
infinitely divisible ch.f. Now we want to answer the converse 
question: if yw 1s an infinitely divisible ch.f., does it follow that the 
mixing distribution G is infinitely divisible? 

Consider the function H(x), x € IR! where 


H(x) = 0, 0.26, 0.52, 0.48, 0.74 and 1 


respectively in the intervals 


(—oo, 1], (1, 2], (2,3), (3, 4], (4, 5] and (5, 00). 
Clearly H is not a d.f. However, we obtain the following 
interesting and unexpected fact that the convolution H * H1s a d.f. 
(27u)~*/* exp(—a*/(2u)) dH(u) i, 


Moreover, the function ae a 


density. Define G as follows: 


Oo 


(2) G(2) =e” LE) ees a 


We can verify that G given by (2) is a d.f. and find that 


- La \ ap wll Pe. 7 ts ah 
| exp (—5eu] dG(u) = et », i fi exp (—5eu] uu) 
os bao. | 

= exp | h exp Cr dH(u) — 1 
0 


= exp ( [lcostte)- tae fw expl-2*/2u)} aia) ae. 


It is easy to see that the last expression in this chain of equalities 
coincides with the Kolmogorov canonical representation for an 
infinitely divisible ch.f. provided that 


Jo u-*/? exp(—a*/(2u))dH(u) = 0 gor all x>0 (see Gnedenko 
and Kolmogorov 1954). But A’ satisfies this condition by 
construction. 

Therefore y defined by (1), with G given by (2), is an infinitely 
divisible ch.f. 

It remains for us to show that G in (2) 1s not infinitely divisible. 
This follows from the Lévy-Khintchine representation for the ch.f. 
of G and from the fact that H is not non-decreasing. 


13.3. A distribution function can belong to the class IFRA but 
not to IFR 


Let F(x), x = 0 be a d.f. with density f. We say that F' is an 
increasing failure rate distribution and write F' € IFR, if its failure 
rate r(x) := f(x)/(1 — F(x)) 1s increasing in x, x > 0. In this case — 
log{l — F(x)] 1s a convex function in the domain where — log[l — 
F(x)| 1s finite. This observation motivates the more general 
definition: Ff € IFR if — log{l — F(x)] 1s convex where finite. 
However, for some problems it is necessary to introduce a 
considerably weaker restriction on /’. For example, if F' has density 


f and failure rate r such that (1/x) fy r(u) du jg increasing in x, we 
say that F' has an increasing failure rate average. In this case we 
write / G IFRA. More generally, # ¢ IFRA if (—1/x) log[l — F(x)] 


is increasing where finite. 

Thus we have introduced two classes of distributions, [FR and 
IFRA, and it is natural to ask what the relationship between them 
1S. 

According to Barlow and Proshan (1966), if F< IFR and F(0) = 
0 then Ff’ € IFRA. We are interested now in whether the converse 1s 
true. To see this, consider the function 


ie Ms if «<0 
B(a) = (l—e *)(l-e"*), if f>0,k>1. 


It is easy to check that F € IFRA but F’ ¢ IFR. 


13.4. A continuous distribution function of the class NBU 
which is not of the class IFR 


A df. F (of a non-negative r.v.) is said to belong to the class NBU 
(new and better than used) 1f for any x, y = 0 we have 


(1) F(a+y)< F(x)F(y) where F=1-F. 


If for any y > 0 the function [F(x + y) - F(x)]/F(x) is increasing 
in x, we say that Fis of the class IFR (compare this definition with 
that given in Example 13.3). 

It is well known that F ¢ IFR = F <« NBU, but in general the 
converse implication 1s not true (see Barlow and Proshan 1966). 

The d.f. # ¢ IFR has the property that it is continuous o’n the set 
{x : F(x) < 1} and, moreover, A(x) = - log F(x) is a convex 
function. However, the elements of the class NBU need not be 
continuous. This follows from a simple example. Indeed, Consider 
the function 


F(z) =1-2-" for 2e€(k,k+1), &=0,1,2,.... 
It is easy to check that (1) 1s satisfied and hence F « NBU. 


Obviously F is discontinuous and hence F ¢ IFR. 

Suppose now that F «¢ NBU and F 1s continuous. Does it follow 
from these conditions that / <« IFR? It turns out that the answer is 
negative. To see this consider the function 


51 (a = snj+1, if e€ (4, 00). 


It is easy to check that F(x) = 1 - e Mx), x > 01s a d.f. and, 
moreover, that 
h(e+y)>h(e)+hly), 2@,y > 0. 


Therefore F’ ¢ NBU and clearly F' is continuous. Nevertheless F 
€& IFR since h(x) = - log F(x) is not a convex function. 


13.5. Exchangeable and tail events related to sequences of 
random variables 

Let {X,, n = 1} be an infinite sequence of r.v.s defined on the 
probability space (Q, F, P). Denote by o{X,...,X,,} the o-field 
generated by X15.+.5Xp- Then clearly 
ear 71Xns Xn4is--+sXn+k} ig a field and let o{X,,, Xp+1,--.} be 
the o-field generated by this field. The sequence of o-fields o{X,, 
Xyt+ls-t, O(Xp+1, Xy4+25-.-$5-.. 1S non-increasing, its limit exists 
and is a o-field. This limit is denoted by 


T= a A Xie pein, exe 


m—1 
J is called the tail o-field of the sequence {X,, 1 = 1}. Any event A 


e J is called a tail event, and any function on Q which 1s 
measurable with respect to J is said to be a tail function. 


Let us formulate the basic result concerning the tail events and 
functions. 


Kolmogorov 0-1 law. Let {X,} be a sequence of independent 
r.v.s and ‘J be its tail o-field. Then for any tail event A, A « ‘J, 
either P(A) = 0 or P(A) = 1. Moreover, any tail function is a.s. 
constant, that is, if Y is ar.v. such that o{ Y! C ‘J then P[Y=c]=1 
with c = constant. 

We now introduce another notion. (Also see Example 7.14). We 
say that the r.v.s X}],...,X, are exchangeable (another term is 
symmetrically dependent) 1f for each of the n! permutations {7],i9, 
....y} of {1, 2,...,.2$, the random vectors (X;1,X72,...,Xj,) and 
(X1,X9,..-X;,,) have the same distribution. Further, an infinite 
sequence {X,, n = 1} is said to be exchangeable if for each n the 
r.v.s X1,...,X;, are exchangeable. The B”~-measurable function 
2(X1, X9,...) 18 called exchangeable if it is invariant under all finite 
permutations of its arguments: g(X],...,X,, Xy4+1,---) = 2X71, 
..esAjnsXnt]---). In particular, A ¢ o{X1,X,...} 1s called an 
exchangeable event if its indicator function /(A) is an exchangeable 
function. 

Let us formulate a result concerning the exchangeability. 


Hewitt-Savage 0-1 law. Let {X,,, n > 1} be a sequence of r.v.s 


which are independent and identically distributed. Then for any 
exchangeable event A € o{X1, X9,...} either P(A) = 0 or P(A) = I. 

Note that a detailed presentation of the notions and results given 
briefly above can be found in the books by Feller (1971), Laha and 
Rohatgi (1979), Chow and Teicher (1978), Aldous (1985) and 
Galambos (1988). 

Obviously tailness and exchangeability are close notions and it 
would be useful to illustrate by a few examples the relationships 
between them. 


(i) The first question concerns the tail and exchangeable events. 
Let {X,, n= 1} be a sequence of (real-valued) r.v.s and J its tail o- 


field. If A < J, then for any permutation {71,...,i,} of {1,...,n},n= 
1, we can write A in the form {(X,4+], X742,...) € By+ } where 


B+ is a Borel set in R™, that is, B, +; <¢ B™. Thus for each n, 


A= (X41, X2, ase ) ER" x Basi} = UXi,5. sii Mins Ant “* :) eR" x Br+1} 


and since Boo = R" x Bn+1 is a Borel set in R™, this implies that 
the tail event A 1s also an exchangeable event. 

However, there are exchangeable events which are not tail 
events. The simplest example is to take A = {X, = 0 for all n= I}. 


Obviously A is an exchangeable event. But A € o{X,, X)+],...! 
for every n= 1. So A 1s not a tail event. 
(ii) Now let us clarify the possibility of changing some of the 
conditions in the Hewitt-Savage 0-1 law. Consider the sequence 
1 
{X,,, n = 1! of independent r.v.s where PLX] = 1] = PLX] =—1] =2 
and PLX, = 0] = 1 for n => 2. The event As (ojo Aj > Y for 
infinitely many n} is clearly an exchangeable (but not tail!) event 
with respect to the infinite sequence of r.v.s {X,,, n = 1}. Moreover, 


P(A) = PLX] > 0] = 2 and hence P(A) is neither 0 nor | as we could 


expect. Here the Hewitt-Savage 0—1 law is not applicable since the 
independent r.v.s X,, n = | are not identically distributed. 


(iii) Let_X,,, n = 1 be independent r.v.s such that 


Py alee" Pixel Sal 3 4 we Lee: 
and let A = {X, = 0 for all m => 1}. Then A is an exchangeable but 
not a tail event. Further, we have 


P(A) PZ 1] PLXn ma 0] ae {fa _ 2-"). 
i=! ee | 
Since Din-12" < , the infinite product atie2 *) 
converges to a positive limit which 1s strictly less than 1 and hence 
P(A) is neither O nor 1. 
Therefore again, as 1n case (11), the Hewitt-Savage 0-1 law does 
not hold. Note that here the variables X,, n = 1, are independent 


and take the same values but with different probabilities. 


13.6. The de Finetti theorem for an infinite sequence of 
exchangeable random variables does not always hold for 
a finite number of such variables 


Let {X,, n = 1} be an infinite sequence of exchangeable r.v.s each 
taking two values, 0 and 1. Then according to the de Finetti 
theorem (see Feller 1971) there is a unique probability measure 


on the Borel o-field 710.1] of the interval [O, 1] such that for each n 
we have 


Ge P\X, = Fis — ES cian a En| a | p*(1 —p)** (dp) 
(0,1] 


where ¢;= QO or landk=e] +... + &. 

In other words, the distribution of the number of occurrences of 
the outcomes 0 and | in a finite segment of length n of the infinite 
sequence X], X7,... of exchangeable variables is always a mixture 
of a binomial distribution with some proper distribution over the 
interval [0, 1]. Thus we come to the question of whether this result 
holds for a fixed finite exchangeable sequence. The answer can be 
found from the following two examples. 


(i) Consider the case n = 2 and the r.v.s X7 and_X9: 


ee] on 
= 


P X, =1, X, =1)=0. 
It is easy to see that X] and X> are exchangeable. Suppose a 


X, = 0, xXy= 0 =F 


representation like (1) holds. Then it would follow that 


1 1 
| p’jt(dp) =0 and | (1 — p)* (dp) = 0. 
J0 J 


This means that 4“ puts mass one both at 0 and at 1, which is not 
possible. 


(ii) Let Yj,... ,¥, be n independent r.v.s with some common 
distribution. Let S, = Y, +... + Y, and Z; = Y; - n! S,, for k= 1, 
..., 2 - 1. Then it is easy to check that the r.v.s Z],..., Z,—1 are 


exchangeable but their joint distribution is not of the form (1). 
Therefore the de Finetti theorem does not always hold for a finite 
exchangeable sequence. 


13.7. Can we always extend a finite set of exchangeable 
random variables? 
If {X,,} is a finite or an infinite sequence of exchangeable r.v.s then 


any subset consists of r.v.s which are exchangeable. 
Suppose now we are given the set X],...,X,, of exchangeable 


r.v.s. We say that X1,...,X;, can be extended exchangeable if there 
is a finite set _X],...,Xi), Xm+]o---> Am+h k= 1, or an infinite set _X] 
205 Xp, Amt], Xm+2,--. Of r.v.s which are exchangeable. Thus the 


question to consider is: can we extend any fixed set of 
exchangeable r.v.s to an infinite exchangeable sequence? Let us 
show that in general the answer 1s negative. 

Consider the particular case of three r.v.s X71, X9, X3 each taking 


1 
the values 0 or | with PLA; = 1] = P[X; = 0] = 2,7 = 1, 2,3. Let 


PX y= Xs = 1) — Pl X= 1 Xe 1 | PX = 1 X= 1) = 02. 
It is easy to see that (X1, X9, X3) is an exchangeable set. Assume 
this set can be extended to an infinite exchangeable sequence 1, 
X95, X3, X4, X5,.... This would mean that for each n > 4 the set_X1, 
X95, X3, X4,...,X, consists of exchangeable variables. Then we can 


easily show that 


- pris Po 2 : ri r 
— ee yu PX, = = ie, — 1 i - 
nt Dyan PIX) = 0.Xq =]! 


n+ (0.2)n(n — 1) — $n? = — — (0.05)n*. 
Obviously it follows from this that n must satisfy the restriction n < 
6. However, this contradicts the definition of an infinite 
exchangeability and therefore the desired extension of a finite to an 
infinite exchangeable sequence is not always possible. 

Interesting results on exchangeability of finite or infinite 
sequences of random events or r.v.s can be found in the works of 
Kendall (1967) and Galambos (1988). 

Finally let us mention that the variables in an_ infinite 
exchangeable sequence are necessarily non-negatively correlated. 
This follows directly from an examination of the terms of the 
variance VLX] + ... +X,]. However, in the above specific example 


we have p(X1, X2) < 0. 


ly 
2 
i, 
2 


13.8. Collections of random variables which are or are not 
independent and are or are not exchangeable 


Let X := {X,, n => 2} be a finite or infinite sequence of r.v.s which 
are independent and _ identically distributed. Then X is 
exchangeable, that is both properties independence and 
exchangeability hold for X in this case. If, however, X,, (at least 
two of them) have different distributions then X is not 
exchangeable regardless of whether X 1s independent or dependent. 

Our goal now is to describe a sequence of r.v.s which are totally 
dependent and exchangeable. For consider a sequence X = {X,,, n = 
2} of 1.1.d. r.v.s each with zero mean and finite variance. Let € be 
another r.v. independent of X. Assume that ¢€ 1s non-degenerate 
with finite variance, that is, 0 < Vé < o. Let us define a new 
sequence 


44 wo} whee J, =Agste. 


It is easily seen that J is exchangeable. Let us clarify whether or 
not J is independent. 
The distribution of Yj1,..., Yj; 1s the same for any possible 


choice of k variables from J, k = 1, 2,.... Taking & = 2 we conclude 
that J is characterized by a common correlation coefficient, say po 


where 

po = p(¥i, ¥;) = (E[¥i¥j] — EVEY;)/(VYiVY5) 7? 
for any two representatives Y; and Y; of d. A simple reasoning 
shows that 


po = (VE)/(VX1 + VE) 


where VX1 (= E[X7]) is the common variance of the sequence X. 
The assumption 0 < V¢é < « implies that po # 0 (in fact po > 0) and 
hence Y; and Y; cannot be independent because they are not even 


uncorrelated. In other words, 9 is totally dependent in the sense 
that there is no pair of variables in % which is independent. Hence 


the sequence JY, finite or infinite, is dependent and exchangeable. 
The final conclusion is that these two properties, independence 
and exchangeability, are incompatible. 


13.9. Integrable randomly stopped sums with non-integrable 
stopping times 

Let X and X1, Xj... be 1.1.d. r.v.s defined on the probability space 
(Q, F, P) and {Fy, n = 0} where Fo = {0,Q} is an increasing 
sequence of sub-o-fields of F. Recall that the r.v. t with possible 
values in the set {1, 2,..., oo} 1s said to be a stopping time with 
respect to {F,,} if the set [w : t(@) =n] denoted further simply by 
[tc =n] belongs to F, for each n. If So = 0, S, =X] + ... +X, then 
S, 1s the sum of the first t of the variables X71, X9,..., that is S$; =X] 
+ ... + X,. For many problems it is important to have conditions 
under which the r.v.s_X, t and S, are integrable. Let us formulate 
the following result (see Gut and Janson 1986). Let r > 1 and EX + 
0. Then E[|X|"] < co and E[|S, |"] < 0 imply that E[z’] < o. 


Our aim now is to show that the condition EX + 0 1s essential for 
the validity of this result. So, let the rv. X have EX = 0. In 


particular, take XY such that PLY = 1] = PLX =-— 1] = 3 and t = 

min{n : S, = 1}. Clearly t is a stopping time with respect to {F,} 

where F = 0{X],..., Xy}. It is easy to check that the r.v. X and the 

random sum S; have moments of all orders, that 1s, for any r > 0 

we have E[|X|"] < 0 and E||S,|"] < 0. However, E{r!/2] = oo and 
1 


therefore E[z’] does not exist for any r > 2. 


Chapter 3 
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SECTION 14. VARIOUS KINDS OF CONVERGENCE OF 
SEQUENCES OF RANDOM VARIABLES 


On the probability space (Q, ¥, P) we have a given r.v. _X and a 
sequence of r.v.s {X,, n = 1}. Important probabilistic problems 


require us to find the limit of X,, as n — oo. However, this limit can 


be understood in a different way. Let us define basic kinds of 
convergence considered in probability theory. 


(a) We say that {X,} converges almost ey (a.s.), or with 


probability 1, to Xas n — o and write Xn — X if 
Pig: im Agiw)=—X(w)| = 1. 
tae, ©) 
(b) The sequence {X,} is said to converge in probability to_X as n 
— oo if for any ¢ > 0 we have 


lim Plw 


nooo 


:|Xn(w) — X(w) 


Pee = ( 


In this case the following notation is used: *n +X or 
Ps iti: eee ge A, 


(c) Let F and F, be the d.f.s of X and X, respectively. The 
sequence {X,,! is called convergent in distribution to X if 


lim F(x) = Fr) 
d 
for all x € R! for which F (x) is continuous. Notation: fn — f’, 
and Xn a Xx, 
(d) Suppose _X and_X,,, n > 1, belong to the space L’ for some r > 0 
(that is E[[X|"] < 00, E[|X,|"] < 0c). We say that the sequence {X,,} 


: - _liv 
converges to X in L’-sense, and write 4n —> *, if 


lim E{|X, — X|"] = 0. 
noo 
In particular, the L’-convergence with r = 2 is called square mean 
(or quadratic mean) convergence and 1s used so often 1n probability 
theory and mathematical statistics. 

Note that the convergence in distribution defined in (c) is closely 
related to the weak convergence treated in Section 16 (see also 
Example 14.1(111)). Some notions (complete convergence, weak 


L! convergence and convergence of the Cesaro means) are 
introduced and analysed in Examples 14.14—14.18. 

Practically all textbooks and lecture notes in probability theory 
deal extensively with the topics of convergence of random 
sequences. The reader is advised to consider the following 
references: Parzen (1960), Neveu (1965), Lamperti (1966), Moran 
(1968), Ash (1970), Rényi (1970), Feller (1971), Roussas (1973), 
Neuts (1973), Chung (1974), Petrov (1975), Lukacs (1975), Chow 
and Teicher (1978), Billingsley (1995), Laha and Rohatgi (1979), 
Serfling (1980) and Shiryaev (1995). 

It is usual for any course in probability theory to justify the 
following scheme. 


convergence a.s. 


convergence in L’-sense 


In this section we consider examples which are different in their 
content and level of difficulty but all illustrate this general scheme 
clearly. In particular, we show that the inclusions shown above are 
all strong inclusions. The relationship between these four kinds of 
convergence and other kinds of convergence of random sequences 
is also analysed. 


14.1. Convergence and divergence of sequences of distribution 
functions 


We now summarize a few elementary statements showing that a 
sequence of d.f.s {F,, 1 = 1} can have different behaviour as n — 


oo, In particular it can be divergent, or convergent, but not to a d.f. 


(1) Let F(x), x € rk! be a d-f. which is continuous. Consider two sets 
of d.fis, {/, n= 1} and {G,, n= 1! where 
F(z) = F(a+n), G,(xr) = F(a+ (-1)"n). 


Obviously F(x) — 1 as n — oo for each x € r!. But a function 
equal to | at all points is not a d.f. Hence {F;,} is convergent but 


the limit iMn—+oo Fn(“) is not a df. 
Further, Go,(x) — | whereas Go,+41(x) — 0 asn— ~ for all x € 


RI, Clearly the family {G,,} does not converge. 
(ii) Consider the family of d.f.s {F),, 1 = 1} where 


0, a2 
Pn (a) = fH if «> 


3 [He [4 
# 


Then F(x) — F(x) if x # 0 and F,(0) = 0 for each n > 1, where F 
is the d.f. of a degenerate r.v. which is zero with probability 1. 
Thus liMn—.o Fn() exists but is not a d.f. because it is not right- 
continuous at x = 0. 


(iii) The following basic result is always used when considering 
convergence in distribution (see Lukacs 1970; Feller 1971; Chow 
and Teicher 1978; Billingsley 1995; Shiryaev 1995): 


(1) | ag eae g(x) dF,,(2) > g(x) dF(2x) 
R} R* 


for all continuous and bounded functions g(x), x € RI, 


Despite the fact that (1) contains a necessary and sufficient 
condition it is useful to show that the assumptions for g cannot be 
weakened. For example take g bounded and measurable (but not 
continuous), say 


a= 0, if r<O0 
OAV 1, if @ > 0. 
Denote by F and F, the d.f.s of the r.v.s X¥ = 0 and X, = I/n 


d 
respectively. Then /n — F and obviously | g dF , = 1 for each n > 


1 but | g dF = 0. Therefore (1) does not hold, as we of course 
expected. 

Finally, recall that the integral relation (1) can be used as a 
definition of the weak convergence of d.f.s (See Example 14.9 and 
the topics discussed in Section 16). 


14.2. Convergence in distribution does not imply convergence 
in probability 


We show by a few specific examples that in general as n — , 


Ti NE te i 
(i) Let_X be a Bernoulli variable, that is_X 1s ar.v. taking the values 
1 and O with probability 3 each. Let {X,, n = 1} be a sequence of 
r.v.s such that X,, = X for any n. Since Xn =X then Xn > X as 


n— 0. Now let Y= 1 —X. Thus Xn —*Y because Y and X have 
the same distributions. However, X, cannot converge to Y in any 


other mode since |X,, — Y| = 1 always. In particular, 
P[|X, —Y|>e]A0 


P 
for an arbitrary ¢ € (0,1) and therefore 4n 7 Y asn —> o, 


(ii) Let Q = {@]1, 2, 3, m4}, F be the o-field of all subsets of © 
and P the discrete uniform measure. Define the r.v.s X,, n = 1, and 
X where 
An (w) = An (we) = 1, Xn (ws) = Xn (wa) =0,n21, 
X (w4) = X (Ww) = 0, X (w3) = X (w4) a I 
Then |X,(@) — X(@)| = 1 for all @ € Q and n = 1. Hence as in 


case (1), X,, cannot converge in probability to X as n — oo, Further, 
if F and F,,, n= 1, are the d.f.s of X and _X,, n = 1, we have 


4 TE ae . a eeu 
F(z) = ae se Sol PA) = 5. if Yee = 1 


i 
Pe 
L,. A eek. Le ib dee, 


Thus F(x) = F(x) for all x € r! and trivially F,,(x) — F (x) at each 


— . d 
continuity point of F. Therefore *n— > * but, as was shown, 
P 


Xn > X. 

(iii) Let_X be any symmetric r.v., for example _X ~ N(0,1), and put 
i ae d | 

X, =—X for each n > 1. Then An =X and Xn —> X. However, 


P 
Xn 7+ X because for an arbitrary ¢ > 0 we have 


P[|X, — X| >e] =P [|X| > se] =2(1-®(ge)) HO as noo. 


14.3. Sequences of random variables converging in probability 
but not almost surely 


(i) Let Q = [0,1], F = B[O, 1] and P be the Lebesgue measure. For 
every number 7 € N there is only one pair of integers, m and k, 
where m > 0,0 <x < 2” — 1, such that n = 2” + k. Define the 
sequence of events A, = [k2-™, (k + 1)2-”) and put _X, = X,(@) = 


lA, (@). Thus we obtain a sequence of r.v.s {X,, n = I}. 


Obviously 
, “ ye. a eee 
PiLA,.| = €| = — 
IXnl 2 | f te | 
Since n — © iff m — 0, we can conclude that 
: : P. 
(1) Xyn—>0 asn—- oo. 


Now we want to see whether in (1) the convergence in 
probability can be replaced by almost sure convergence. 

It is easy to show that for each fixed wm € Q there are always 
infinitely many n for which _X,(@) = | and infinitely many n such 


that X,(@) = 0. Indeed, w € [k2-™, (k + 1)2-”) for exactly one x 
where x = 0,1,..., 2’” — 1, that is @ € A7,,41. Obviously, if k < 2’” — 


1 then also @ € ,Ao,,4+;-41 and if x = 2’ — 1 (and m => 1) then also w 
€ Aj,+ 1. In other words, w € A, 1.0., and also w € Q|A, 1.0. which 


means that limsup,...X%n=1 and liminfyp.. X, = 0. 
Therefore 


o.5. 
Xn >0 as n- oo. 


(ii) Consider the sequence {X, , 7 > 1} of independent r.v.s where 

— | 1 1 

PA, =l=-, PA, =o) =—1——, we 1. 

n n 
Obviously for any ¢, 0< ¢< 1 we have 
, 1 | 
P|| Xn = 0) > E| = PX — l| — - —_ () as Tho Ox 
; 


P 
and thus *n—>9 as n — oo. It turns out, however, that the 


= ese ; 
convergence 1n —~ 0 fails to hold. Let us analyse a more general 
situation. For given r.v.s_X, X,, n= 1, 


define the events 


Apgle\=A\ Ng A |e, Beale) =F Age): 


FLT. 


(2) X,“3Xasn>00 =< P(Bn(e)) > 0asm- ow forall e > 0. 


Indeed, let 


C= {weE: Xy(w) > X(w)asn oo}, Ale} ={wEN:w € Ap(e) io}. 


Then P(C) = | 1ff P(A(e))x = 0 for all ¢ > 0. However {B,,(e)} is a 
decreasing sequence of events, B,(e) | A(e) as m — co and so 
P(A(e)) = 0 iff P(B,,(e)) — 0 as m — 0. Thus (2) is proved. 

Using statement (2) for our specific sequence {X,,} yields 


P(B,(€)) =1—-— lim P[X, =0 forall n such thatm <n < M]. 


\t Soo 


By the independence of X,, 


: E | 
P(Bm(e)) =1-{(1-—]{1- S| ees 
(Bm(€)) ( m ( m + :) 


and since the product [Tim(1 — k~") is zero for each m € N we 
conclude that P(B,,(e)) = 1 for all m, that is P(B,,(e)) does not tend 
to zero as (2) indicates. Therefore the sequence {X,} does not 


converge almost surely. 


14.4. On the Borel-Cantelli lemma and almost sure 
convergence 


Let {X,, n= 1! be a sequence of r.v.s such that for each ¢ > 0, 


oo 


(1) S° P[|Xn| > €] < 00. 


1 Fhe 


According to the Borel-Cantelli lemma, if {4,, n > I} is an 


arbitrary sequence of events and din=1 P(An) < ©, then P[A,, 
1.0.] = 0 (see also Example 4.2). This lemma and condition (1) 
immediately imply that Xn —*0 as n — o. Moreover, the same 


. . a.o. ‘ 
conclusion, Xn —*9 as n — oo, holds if for any sequence of 
numbers {é,} with e, | 0, we have 


= 


(2) y P| Xa| Sen] SO: 


m1 


We now want to clarify whether the converse of the statement is 
true. For this purpose consider the probability space (Q, J, P) 
where 02 = [0,1], ¥ = Byo, 1] and P is the Lebesgue measure. 


Define the sequence of r.v.s {X,, n= 1} by 


: | i frie 
iE hla) l, ifl—-nt<w<1. 
Obviously Xn "+0 as n — 00. However, for any ¢, > 0 with ¢, | 


0 we have P[|X,| > ¢,] = PLX, = 1] = n! for sufficiently large n. 
Thus 


SPI 


m—1 


Zoi | pou, | ie: 


Therefore condition (2) 1s sufficient but not necessary for the 
almost sure convergence of X,, to zero. 


14.5. On the convergence of sequences of random variables in 


L!-sense for different values of r 


Suppose X and_X,, n > 1 are r.v.s in L’ for some fixed r, r > 0. 


Then _X, X,, € L* for each s, 0 <s <r. This follows from the well 
known Lyapunov inequality (see Feller 1971; Shiryaev 1995), 
(E[LXP])! < (EX)! 0 < s < 4, or from the elementary 


inequality [x|X < 1 + |x|", x € R!, 0 < s <r (used once before in 
Example 6.5). In other words 


wie 2 y’ 4 AS —+X for 0<s<r. 
Let us illustrate the fact that in general the converse is not true. 
Consider the sequence of r.v.s {X,,, 1 = 1} where 


A. Sn) = ao PiAg=U|., 1 Pa, he See: 
Then we find 
E[X3] =n"? 0 as noo 
which implies that *n eee However, 
“»0o as noo 


he ut 
and therefore *n —* 0 # Xn —0 for all r>s. 


14.6. Sequences of random variables converging in probability 


but not in L’-sense 
(i) Let {X,,, n= 1} be r.v.s such that 
i ta n = 7 | 
PX, =e") =-, P[X,=0)=1--, n>1. 
n n 


Then for any ¢ > 0 we have 


] 
P||X,| < ¢] = P|X, =0) =1---1 asn-o 
n 
P 
and hence Xn —> 0 as n — oo. However, for each r > 0, 


ee | 
E[X7,]=e"™"-——- co asn—-oo 
n 
ie 
and therefore *n 7? 0 as n > ow. 


(ii) Consider the sequence {Xp, 1 = 1} where X,, has a density 


. 7m 1 
LL) S- ——_ . Te Ih - rh > 1 
al ) n(1 be n*z*) ; = 


(that is, X, has a Cauchy distribution). Since for any ¢ > 0, 
P| xX.| <= <|/= / fn(a) dz = —arctan(ne}) > 1 asn—- co 
| mie NE 


P 
we conclude that Xn — 0 as n > o, But for any r> 1, E[|X,|/] = 


oo and thus the sequence {X,,} cannot converge to zero as n — oo in 
L'-sense for r > 1. Moreover, since X,, n > 1, do not belong to the 
space L’ it is not sensible to speak about L’-convergence. 
(iii) Define the sequences {Y,, 1 => 2} and {Z,, n= 1] as follows: 
PY = | = loon = L—P|y¥,. =U. 

24, =0|\Siah™, Pil4, =n) Shien"). Oars: 
Then, as n — 00, Yn +0 but Yn we 0 for any r > 0; Zn +0 but 
Zn, 30, 


14.7. Convergence in L’-sense does not imply almost sure 


convergence 
(i) Consider again the sequence {X,, n = 1} defined in Example 
14.3(1): namely, X, = X,(@) = 14,(@) for n = 2 + k and A, = 
[x2 (k + 1)2-”). Since w € Q = [0, 1] and P is the Lebesgue 
measure then E[|X,,|] = ELX,] = 2” — 0 as n — o. Thus 


fe 
Xn 70 ano. 


Nevertheless, as was shown in Example 14.3(), Xn 0 asn > 
00, 


(ii) Let {X,, n = 1} be a sequence of independent r.v.s defined by 


P(X, =n?) =1/n, P[X, =0])=1-1/n, n>1 


where 7 > 0 is arbitrary. It is easy to see that 
E(|X,|"] = ELX7] = (nV@)"n-1 =n“? 50 asn > ow. 


L* 
Therefore for any r>0, Xn — Vasn— ~. 
Let M < N be positive integers. Since X, are independent, we 
find 


N 
Plall X, =0 forM<n<Nj= 1] (1 — 1/n). 
nM 


The continuity of P (By | Bo => P(Byn) | P(So0)) implies that 


Pitn-mM (Ww: Xn(w) <e)| = lim I] (1 -— 1/n) 


for arbitrary ¢ > 0 and integer M. Separately we can check that for 
arbitrary M, M20 - 1/") = 0 and neu ( - 1/n) = 0. Thus 


Pi we Kel) Se) 0 
Since the r.v.s X, are non-negative this relation means that the 


sequence {X,,} cannot converge almost surely. 


(iii) Let {Y,, n = 1] be a sequence of independent r.v.s given by 
Ply, =0)=1-1/n*, Pl¥,=+1])=1/(2n*), n>1. 


, WL as. 

Then it can be shown that Yn — 0 but Yn 7? Vas n > ow, 

(iv) Let {S,, 1 => 1} be a symmetric Bernoulli walk, that is S,, = ¢1 

+... + ¢, where ¢ are 1.1.d. r.v.s each taking the values (+1) or (—1) 
1 

with probability 2. Define X;, = X)(@) = l{sn=0] (@), n 2 1. Then 

for every r > 0 we have 


lim E[LX7]| = 0. 


TL OO 


r ing ’ ° 
Thus 4n—9 as n — o for r > 0. However, the symmetric 
random walk {S,,} 1s recurrent in the sense that S,, crosses the level 


zero for infinitely many values of n (for details see Feller 1968; 
Chung 1974; Shiryaev 1995). This means that X, = 1 1.0. and 


el... 
therefore *n 7? [as n > ow. 


14.8. Almost sure convergence does not necessarily imply 


convergence in L’-sense 


(i) Define the sequence of r.v.s {X,, n = 1} as follows: 


PAg=VH=lelyw, BAg=nl— simp = 1/(2n7), wo. 


er s 


[21 << 


Since Ely 4/2) _ 1/n@1/2 we find that at 2 | E| 
for any a > . According to the Markov inequality we have P[[X,]| 
<e|l<eé LEX, [1/7] and hence 2on=1 PllXnl > €] < 00 for every 


e > 0. Using the Borel-Cantelli lemma as in Example 14.4 we 
conclude that Xn 7? Vas n > o, 


L2 
Further, E[|X,.|7] = 1/n* and hence for any & <2,Xn fi as 


n— ©, 
3, A.8. i 
Therefore, if % © 5 2] then Xn 7 9 but Xn 7? Vas n — oo, 


(ii) Let {y,, 1 = 1} be a sequence of r.v.s where Y,, takes the values 


e” and 0, with probability n2and1—n 2 respectively. Since for 


any ¢ > 0, P[|X,| >¢] = PLX, ><]= PLX, =e”] =n 2 and 


2 - 2 


» Pil Sel= >. = < 00 


n=] —- 


we conclude as in case (i) above that Xn—>0 as n —> o. 
Obviously, 


E[|X,,|"] = E[X"] =e""/n? +00 asn— 00 


cL. Ss. Fr L 
for any r > 0. Therefore, as n > 0, Xn —>0 but Xn /* 9 for all r 
> 0. 


14.9. Weak convergence of the distribution functions does not 
imply convergence of the densities 
Let F, F,, n = 1 be d.f.s such that their densities f, f,,, n = 1 exist. 
According to the well known Scheffe theorem (see Billingsley 
1968), if f,(x) >: fix) as n > © for almost all x <¢ R! then 
oe lhe. oes 
fi, —>* £’ as n — o. It is natural to ask whether or not the converse 


is true. The example below shows that in general the answer is 
negative. Consider the function 


O, if r= 0 
sin 2n7z 
Fala) = 4 2 (: “ a) if 0<2<1 
2NTT 

ie it oe 1h, 
Then F’, is an absolutely continuous d.f. with density 
- 7... J 1l—cos2nrz, if ze [0,1 

Jn(@) = : otherwise. 

Also introduce the functions 


0, if <0 _ 
F(x) = 2 COSeeL fer ‘ aM 0) 1 
1. if c>1, , otherwise. 


Obviously F and f are the d.f. and the density corresponding to a 
uniform distribution on the interval (0, 1] respectively. It 1s easy to 
see that 

F,(z) > F(z) as n—ooforalla€ R’. 


However, 


In(z) ao fle). as. 1-400. 


, a , 
Therefore we have established that in general /n —? F # fn > fF, 


a Sa Ay os: 
14.10. The convergence 4, — X and Yn — Y does not 
d a 
always imply that Xn + Yn —> X +} 


Let X, X,, n => 1 and Y, Y,, n = 1 be r.v.s defined on the same 


d fh ce 
probability pace. Suppose 4n —> X and Yn —> Y as n — «. Does 


: 

it follow from this that Xn + Yn —> X + Y as n — ow? There are 
cases when the answer to this question is positive, for example if 
X, and Y,, n = 1 are independent, or if the joint distribution of X,,, 


Y, converges to that of X, Y (see Grimmett and Stirzaker 1982). 

The examples below aim to show that the answer is negative if we 

drop the independence condition. 

(i) Let {X,, n = 1} be 11d. r.v.s such that X, = 1 or 0 with 

| d 
probability 2 each and put Y, = 1 — X,. Then An —% and 
oe 

Yn — Y asn — o where each of X and Y takes the values 1 and 0 

with equal probabilities. Further, since X, + Y, = 1, it is obvious 

that _X,, + Y,, does not tend in distribution to the sum _X + Y which is 
ae 

ar.v. with three possible values, 0, 1 and 2, with probabilities 4, 2 

| 
and 4 respectively. 
(ii) Suppose now the sequences of r.v.s {X,,n = 1} and {Y,,n = 1} 
d dy, 
are such that Xn —>X and Yn —*> Y where X ~ N(0,1), Y~ N 
(0,1). If for each n, X, and Y, are independent, then 
d 

Xn +Y¥n—->4Z with Z ~ N(0, 2). Moreover, in this case the 

distribution of (X,, Y,,) converges to the standard two-dimensional 

normal distribution with zero mean and covariance matrix 


1 O 
0 1 
Let us now drop the condition that X, and Y, are independent. 
d 
Again take {X,, n> 1} such that Xn —> A with X ~ N(0,1) and let 
1 : 
Y, =X, for all n ¢ N. Then Yn —> Y where Y ~ N(0,1). Obviously 


the sum _X, + y, = 2X, and it converges in distribution to a r.v. 7 
where Z ~ N(0,4) but not to ar.v. distributed N(0, 2) as expected. 


P 
14.11. The convergence in probability *»— *X does not 
a 
always imply that 9 (Xn) — 9(X) for any function g 


The following result is well known and is used in many 
probabilistic problems (see Feller 1971; Billingsley 1995; Serfling 


1980). 
ae 
If X, X,,,n > 1 are r.v.s such that Xn —> X as n — 0 and g(x), x 
P 
< r! is a continuous function, then 9 (Xn) —> g(X) asn — oo. 
By a specific example we show that the continuity of g is an 


essential condition in the sense that it cannot be replaced by 
measurabihty only. To see this, consider the function 


Om . 2b es 
a lL. ood, 


The sequence {X,, = 1} can be taken arbitrarily but so as to 


satisfy the properties X, > 0 for all n ¢ N and An 30 as n > o. 

For example, let X,, take the values | and n_! with probabilities 

nlandi-n! respectively. Then obviously <n —+X where_X = 

0 a.s. Moreover, for each n we have g(X,,) = 1. However, g(X) = 0 

and hence g(X,,) cannot converge in any reasonable sense to g(X). 
Pp 


In particular, Xn) A WX) as n > w despite the fact that 
e.g 

We come to the same conclusion by considering the function g 
defined above and the sequence of r.v.s {X,,n = 1} where X, ~ N 


P 
(0, o*/n), o* > 0. Obviously Xn —> X as n > o with Y= 0 as. 
Since X,, 1s symmetric, we have for each n, 


-, | 0, with probability 
Xn) = { 1, with probability 


bolebole 


P 
However, g(X) = 0 a.s. and hence 9 (Xn) 4 g(X) as n > 00. 


14.12. Convergence in variation implies convergence in 
distribution but the converse is not always true 


Let_X and Y be discrete r.v.s such that 


PA =| —o,, PY ae) oe 


where 


“mek pe 20, G2 SU, FH bias Sere, Yee. 


If F and G are the d.f.s of X and Y respectively, then the distance in 
variation, v(F’, G), 1s defined by 


i) v(F,G) = 5°, [Pe — del- 


If X and Y are absolutely continuous r.v.s whose d.f.s and 
densities are F’, G and f, g, then v(F’, G 1s defined by 


(2) (EG | Roos 

Suppose fF, F,,, n = 1, are the d.f.s of the r.v.s X, X,, n = 1, 
respectively. If v(F,,, F) > 0 as n > © we write fF, —> F and also 
Xn —> X and say that the sequence {X,,} converges in variation 
to X as n — o, It is easy to see that convergence in variation 
implies convergence in distribution, that is, fn —+ F > Fn “3 F 
However, as we shall now see, the converse is not true. 
(i) Let F,, be the d.f. of ar.v. X,, concentrated at the point 1/n. Then 
Fy, fy as n — © where F¢ is the d.f. of the r.v. Xp = 0, while 
the quantity v(F,, fo) calculated by (1) does not tend to zero as n 
— > OO. 
(ii) Let F(x), x « R! be a df. with density f(x), x ¢ R!. Our goal is 
to construct a sequence of d.f.s {F,,, > 1} such that Fn — F but 


We 
F, 7? F as n — o. Denote 


a a +xXO 
i / f(x)cos*nadz, Jpn = had f(x)sin* nada, n>1. 
J —OO . 


The obvious identity 4n + Jn = = fo. f(e)de = | implies that 
the numerical sequences {/,, n = 1} - {J,, nm = 1} cannot 
simultaneously tend to zero as n — o. Thus, we can assume that 
e.g. I, 7 Vas n— o. In such a case we introduce the function 


In(z . =¢nf(z)(1+cosnx), «€R 


where Cn = J_.,f(z)(1+cosnz)dz Then for each nthe 
function es is a — wt let F’,, be the corresponding d.f. Let us 
try to find the limit of the sequence {F,} as n — oo. Since fis a 


density, then the well known Riemann-Lebesgue lemma (see e.g. 
Kolmogorov and Fomin 1970) 


| [(z)cosnrdz +0 asn—- oo 
B 
holds for any Borel set B € B!. Hence 
| falayde = / f(x)dz, as n>00 => F,-3F as noo. 
JB JB 
Let us now calculate the distance in variation vf’, /’). We have 


v(F,, F) = — eat Flay) dat 
— — 9 ae ear ie ) cos TL — (1 — Ged (2) | dx 
> | [25 lenf (x) cosna| da — 75, |(1 — en) f(w)| da}. 


We find that c, — 1 asn — o and 


/ | f(x)|cosnz|dzr > , _ fe cos? nadz =I, 7 0. 


« — Cx as — 


Therefore v(F,,,F) 7? 0 asn —> ©, ie. the sequence of d.f.s {fF} 


does not converge in variation to F' despite the weak convergence 
established above. 


14.13. There is no metric corresponding to almost sure 
convergence 

It 1s well known that each of the following kinds of convergence: 

(i) in distribution; (ii) in probability; (iii) in L’-sense, can be 

metrized (see Ash and Gardner 1975; Dudley 1976). It is therefore 

natural to ask whether almost sure convergence can be metrized. 

Let us show that the answer 1s negative. 

Let ‘k denote a set of r.v.s defined on the probability space (Q, F 
,P)andd:R x R > R" a metric on XR, that is, d is non-negative, 
symmetric and satisfies the triangle inequality. Let us check the 
correctness of the following statement: 


Bor.X,. Ay Roped, dX. XV if AX. 


Suppose such a function d does exist. Let {X,, n = 1} be a 


sequence of r.v.s converging to some r.v._X in probability but not 
almost surely (see Example 14.3). Then for some 0 > 0 the 
inequality d(X,,, X) > 0 will be satisfied for infinitely many n. Let 


A denote the set of these n. However, since Xn =X there exists 
a subsequence {X, 7, nj; <¢ A} of {X,, n ¢ A} converging to X 
almost surely. But this would mean that d(X,;, X) — 0 as np — ©, 
which is impossible because d(X,,;,, X) = 0 for each nz € A. 


Thus the statement given above is incorrect and we conclude 
that a.s. convergence is not metrizable. Note, however, that this 
type of convergence can be metrized iff the probability space 1s 
atomic (see Thomasian 1957; Tomkins 1975a). 


14.14. Complete convergence of sequences of random 


variables is stronger than almost sure convergence 


The sequence of r.v.s {X,, 1 = 1} is called completely convergent 
to 0 if 


(1) Jim > P[|X,,|>¢|=0 forevery => 0. 


77L—TL 


Cc 
In this case the following notation is used: Xn — 9. 
In order to compare this mode of convergence with a.s. 
convergence, recall that 


(2) X,-50 <> lim Piux 
te 


rrt—T 


tl Avi Sr] 0. 


Since the probability P 1s semi-additive, we obtain 
POF Alsen Se Ba Ell 


which immediately implies that Xn — > 0= Xn —>0, However, 
the converse is not always true. To see this, consider the 
probability space (Q, F, P) where 9 = [0,1], F = Byo.1] and P is 
the Lebesgue measure. Take the sequence {X,, n => 1} where 


1 if 
Xy = Xy(w) = +0 if 


Then clearly this sequence converges to zero almost surely but not 
completely. 
These two kinds of convergence are equivalent if the r.v.s X,, n 


> 1, are independent (Hsu and Robbins 1947). 


14.15. The almost sure uniform convergence of a random 
sequence implies its complete convergence, but the 
converse is not true 


Recall that the sequence of r.v.s {X;,,n = 1} is said to converge 


almost surely uniformly to ar.v. X if there exists a set A <¢ F with 
P(A) = 0 such that X, = X,,(@) converge uniformly (in @) to_X on 


the complement A°. Note that almost sure uniform convergence 
implies complete convergence discussed in Example 14.14 (see 
Lukacs 1975). Thus we come to the question of the validity of the 
converse statement. To find the answer we consider an example. 

Let the probability space (Q, F, P) be given by Q = [0,1], F =B 
(0, 1] and P is the Lebesgue measure. Consider the sequence {X), 1 
> 1! of r.v.s such that 


1 — 2n?w, f 0S @<1/2n") 
Xn = Xn(w) = ¢ 0, if 1/(2n?) <w <1-—1/(2n?) 
1—2n*+2n?w, if 1—1/(2n?)<w <1. 


For arbitrary ¢ > 0,0 < X, <e¢ iff @ € (1 - d/Qn*), 1-1 - 
¢)/(2n7)). Hence 


Pi Ay| 2 el SPX, el Sl e)/n? 


so that 
> PllXal > €] = (1-¢) } (/n?) < oo, 
n=l el | 


This means that the sequence {X,,} converges completely to zero. 
Now let us introduce the sets 
- Lpse 12 . 
By = |0,gm)U CU — an’, Un 2 1 Clearly PB) = V(2n?). 
Suppose for some set A with P(A) = 0, _X,, converges to zero almost 


surely uniformly on A‘. Then there exists a number n, € N 


| 
independent of @) such that |X,,| >< «<2 on A© provided n => n,. 
nN cé 


However, we have BnOA°=% and B,. Cc A. Hence 


Li 2239 . — 
P(A) > P(Bn.) = 32°. This contradiction shows that the 
sequence {X,} defined above does not converge almost surely 


uniformly to zero. 


14.16. Converging sequences of random variables such that 
the sequences of the expectations do not converge 


If the sequence of r.v.s {X,} converges in probability or a.s. to 


some r.v._X, then under additional assumptions we can show that 
the sequence of the expectations {EX,,} will tend to EX. However, 


in general such a statement is not true without appropriate 
assumptions. 
(i) Let {X,, n= 1! be r.v.s defined by 


P[X, = —n —4] =1/(n+4), P[X, = -1] =1-4/(n +4), 
PX, =n+4| = 3/(n+4). 
Obviously for any ¢ > 0 we have 
Pi|X, — (-1)| > ee] = 4/(n+ 4) 
P 
and hence *n —*(—1) as n — 00, On the other hand, 
EX, =1+4/(n+1) and lim EX, = 1. 
er 


Therefore 

lim EX, =—14-1=E|P-— lim Xn| 

Pars aie 
and the convergence in probability of X,, to X is not sufficient to 
ensure that EX, — EX. This can be explained by referring to the 
standard result (see Lukacs 1975; Chow and Teicher 1978): if X), 


- 
n>1 and X are L’ r.v.s and 4n — % then E[|X,,\"] — E[IX*] for 
eachO<k=r. 


(ii) Consider the sequence {Y,, n => 1! of r.v.s where 


¥,(w) = n?, if O<w<n7! 
Th 0) if n! es < 1 


and also the r.v. Y(w) = 0, w «€ [0,1]. Then for every w «< [0,1] we 
have Y,(@) — Y(@) as n — oo. However, EY, =n and EY, PEY= 


Oasn— 0, 
Let us note that in case (11) EY, 1s unbounded, while in case (1) 


EX, is bounded. According to Billingsley (1995), if {Z,} is 
uniformly bounded and lim, _, ~ Z, = Z on a set of probability 1, 
then EZ = lim, —_, . EZ,. Both cases (1) and (11) show that uniform 
boundedness is essential. 


14.17. Weak L! convergence of random variables is weaker 


than both weak convergence and convergence in Ll. 
sense 


Recall that the sequence {X,,, 1 => 1} of r.v.s in the space L! is said 


to converge weakly in L! to the rv. X iff for any bounded r.v. Y we 
have 


(1) lim E[X;Y] = E[XY]. 


Beant - w,Le 
In this case the following notation is used: 4n —> X asn —> . 


Clearly the limit X belongs to L! and it is unique up to 
equivalence (see Neveu 1965; Chung 1974). 

It is of general interest to clarify the connection between this 
mode of convergence and the others discussed in the previous 


w Li cet 
examples. In particular, if Xn —> X, does it follow that Xn —> X 


¥ ty r' 
or that Xn — X? 


Remark. Here the notation —> is used to denote the so-called weak 
convergence of X, to X as n — co which in this case 1s equivalent 
to convergence in distribution. In a more general context weak 
convergence will be considered in Section 16. 

To answer these questions consider the probability space (Q, J, 
P) where Q = [0,1], F =B[0,1] and P is the Lebesgue measure, and 
take the following sequence of r.v.s X,(@) = sin 2znw, n = 1. Note 


that {X,} is not convergent in either sense—weak or L|-sense. 


Nevertheless we shall show that Xn “0 in the sense of definition 
(1). 

Let Y be any bounded r.v., that is Y = Y(@), w € [0,1] is an F¥- 
measurable function. Then there is a sequence of stepwise 
functions {YC (@), m > 1} such that Y°™ *Y as m > ow (see 
Loéeve 1978). By the Egorov theorem (see Kolmogorov and Fomin 
1970; Roy den 1968) for any ¢ > 0 we can find an open set A, © 


[0,1] such that the convergence Y() — Yas m — o is uniform for 
w € AS = |0,1]\Ac. Here we can also use the Lusin theorem (see 
Kolmogorov and Fomin 1970) on the existence of a continuous 
function Y* coinciding with Y on the complement of a set of «- 
measure. In both cases, for stepwise or continuous Y, we have 


E[X,Y* 


+ | 
— / Y*(w) sn2tmnwdw +0 as now. 

0) 
Since Y and Y* are bounded and Y* is close to Y, the difference 
JELX,, Y* | — ELX,,Y]| can be made arbitrarily small. Hence ELX,, Y] 


w,L? 
— 0 as n— for any bounded r.v. y. Therefore 4n —> 9 as n > 
oo, However, as noted above, neither of the relations +n — 0 or 
L? 
pai n> 0 is true. 


14.18. A converging sequence of random variables whose 


Cesaro means do not converge 


Let {X,, n = 1} be a sequence of r.v.s. Then the following 
implication holds: 


‘ a.o. I aL. 8. 
(1) X,— 30 a nous _ (x i+...+Xy)—30 as n- 00. 


This follows from the standard theorem in analysis about the 
Cesaro means. 

Our aim now is to show that almost sure convergence in (1) 
cannot be replaced by convergence in probability. Indeed, consider 
the sequence of independent r.v.s {¢,, 1 = 1} where ¢, has a d.f. F), 


given by 


min 2 £ if a <0 
n(Z)= 4 1 _ l/(c+n), if >0. 


Then for every fixed ¢ > 0 we have 
Pilé,| > ¢] = 1—- Fale) = 1/(e +n) 


= 
which means that &n — 0 as n — oo. Let us show that the Cesaro 
means 


1 P 
In = (€:+...¢&)0 a now. 


Denoting M, = max{cq,..., ¢,} and taking into account the 
independence of the variables ¢; we can easily show that for any x 
> 0 


l | 1 I ] e 
PM, a| = | — —— 1|- ello <i1l- 
e+] yt? +n xtn 


Therefore 


1 n 
(2) P| M,/n <eé| < (1 


en + 7 
Since [M,,/n > ¢] C[n, > €], 


PiMein Sel = Ping > el => PiMefe se] = Bing = el. 


Combining the last relation with (2) we see that 


| ‘\, Fh 
Pita Se) = 4 LS 
"7 Vv ( Gar x] 


and hence 
lim Pin, > ¢] >1— lim Ply, < e] > 1 —exp(—(e+1)7*) > 0. 
he Od TL OO 
This means that 7, does not converge to zero in probability. 


P x. 4 P 
Therefore in general fn > 0 # (Er ++. + En) 0 ag yn — ow, 
Finally, to indicate one additional case leading to the same 
result. Let {X,, n = 1} be independent r.v.s, where X,, takes the 
values 2” and 0, with probabilities n tand1-n! respectively. 


| 
P 1/ 
Then Xn —> 0 but n(41 +77 + Xn) 779 ag n> (the details 
are left to the reader). 


SECTION 15. LAWS OF LARGE NUMBERS 


Let {X,, n = 1} be a sequence of r.v.s defined on the probability 
space (Q, F, P). Define S, =X] + ... + X,, ap = EX;, Ay = ES), = 
A, t+... + Ay. 

We say that the sequence {X,,} satisfies the weak law of large 
numbers (WLLN) (or that {X,} obeys the WLLN) if 


le | Pa. # ee ~ os 
nn — 7An —* 0 as N — © that is if for any ¢ > 0 we have 


| | ] 
lin P | —S, — —A,, 
nh 


noo Th 


>] =0 


Further, n°" ~ niin —?0 asin — that is if 


P g : lim (<5, (w) — = An = | a 
noo \ fi rh 


we Say that the sequence {X,,} satisfies the strong law of large 
numbers (SLLN) (or that X,,} obeys the SLLN). 


Let us formulate some of the basic results concerning the WLLN 
and the SLLN. It is obvious that either {X,} 1s a sequence of 


identically distributed r.v.s or these variables are arbitrarily 
distributed. 

Khintchine theorem. Suppose that {X,, n = 1} is a sequence of 
l.i.d. r.v.s with E[|Xq|| < 0. Then this sequence satisfies the WLLN 


| 


oo 
and n* Tt / 


'n — © where a= EX}. 


Kolmogorov theorem 1. Let {X,, n > 1} be a sequence of i.i.d. 


r.v.s. The existence of E||X,|| is a necessary and_ sufficient 


lo a-5. 
condition for the sequence {X,} to satisfy the SLLN and non —?a 
as n — © where a= EX. 


Markov theorem. Suppose {X,, n= 1} is an arbitrary sequence 
of r.v.s such that the following condition holds: 


| 
(1) —<V|X,+---+X,] 40 asn—-co (Markov condition). 
ae 


Then (X,} satisfies the WLLN. 


Kolmogorov theorem 2. Let (X,, n = 1} be a sequence of 
independent r.v.s_ with On = VXn < OO, n => 1. Suppose the 


following condition is fulfilled: 


ox 
js #8 
(2) ~< oo (Kolmogorov condition). 
ee 


n=1 


Then the given sequence satisfies the SLLN. 

In the examples below we refer to (1) and (2) as the Markov 
condition and the Kolmogorov condition respectively. 

A detailed presentation of the laws of large numbers can be 
found in the books by Doob (1953), Gnedenko (1962), Fisz (1963), 
Révész (1967), Feller (1971), Chung (1974), Petrov (1975), Laha 
and Rohatgi (1979), Billingsley (1995) and Shiryaev (1995). 

In this section we consider examples which illustrate the 
importance of the conditions ensuring the validity of the WLLN or 
of the SLLN as well as the relationship between these two laws 
and some other related topics. 


15.1. The Markov condition is sufficient but not necessary for 
the weak law of large numbers 
(i) Let Ans n = 1} be a sequence of independent r.v.s such that_X,, 


has a Xn-distribution with n degrees of freedom: that is, X, has a 
density 


5 [20'(n/2)|—!(2/2)"-)/? exp(—2z/2), if x >0 
fr(z) = 4 0 Das 
otherwise. 
Then EX, =n, VX), = 2n and clearly the Markov condition is not 
satisfied. Hence we cannot apply the Markov theorem to determine 
whether the sequence {X7,,} satisfies the WLLN. 
We use the following result (see Feller (1971) or Shiryaev 
(1995)). 
If {¢,, n= 1} 1s a sequence of r.v.s and ¢, has a ch.f. @g,, then 


f,—>0 as n-0co => dn(t) 91 as noo forall tER’. 


The ch.f. yw, of X, is w(t) = (1 — 2it)”. Then calculating the 
ch.f. Yn of (S, — ES,)/n where S, = X, + + + X, and 
ES, = 5n(n + 1) we find that Un > 1 asn > © for all ter! and 


in view of the above result we conclude that the sequence {X,} 


does satisfy the WLLN. 
Note that analogous reasoning leads to the same conclusion if 
we consider the sequence of discrete independent r.v.s {Y,, n= 1} 


, 


where P[¥n = 1] = 3(1-2™") and Pry, =+42"] =2-tD 
Therefore the Markov condition 1s not necessary for the WLLN. 
(ii) Let {Y,, 1 = 1} be independent r.v.s where Y,, has a density 


fale) = (V20,)-"exp(—V2|e|/o,), @ER’. 
It is easy to show that EY, = 0 and VY, = 7. Let us choose 7n in 
the following special way: 7, = n'*6 where 0 <0 < 1. Then the 


Markov condition is not fulfilled. Nevertheless, as will be shown, 
the sequence {Y,} satisfies the WLLN. However, to prove this 


statement we need the following result (see Feller 1971): let 7, be 
independent r.v.s and let 


| | . 
(1) P| In| >= <¢ 


for any positive ¢, 0, k = 1, 2,..., n and all sufficiently large n. 
Denote by {tt the truncated sequence with some constant c > 0, 
that is Mk = nz if |yz| < c and "Ik = c if \yz| > c. Then {7,} obeys the 
WLLN iff the following two conditions hold for any ¢ > 0 and c > 
QO: 


_ 
A li » Pl—\In-| > ¢| —O0 
(2) im in| > | 


Th 


l x 
3 lim — Vi, = 0. 
( ) eve n Dy Ik 


k—1 


Now for any fixed ¢ > 0 we can easily show that 


P Fla a | — exp[—V2en/kUt9/?) 1 dee anaaith 


and for sufficiently large n the right-hand side can be made 
arbitrarily small. Thus condition (1) holds. 

For given ¢ > 0 and constant c > 0 let N be an integer satisfying 
the relations: eV < c and e(N + 1) > c. Choose n > N. Then 


IN 
(4) lim LP l¥,| > en] = lim a > exp[—V2en/kUF9)/2), 


sg 


Since the sum on the right-hand side of (4) contains a finite 
number of terms and each term tends to zero as n —-oo, (4) implies 


(2). 
It then remains for us to check condition (3). A direct calculation 
shows that 


VY, = ki*811 — exp(—eV2/Kt8/2)) — V2kC+9)/? exp(—ev2/KU+8/?), 
Using a Taylor expansion we find 
VY, = 7 + (c3 /KUTO/2 \ 4, 


where ¢;, includes higher-order terms (their exact expressions are 
not important). From this we can easily derive (3). 


Therefore according to the Feller theorem cited above the 
sequence {Y,} satisfies the WLLN. Again, as in case (1), the 


Markov condition is not satisfied. 


15.2. The Kolmogorov condition for arbitrary random 
variables is sufficient but not necessary for the strong 
law of large numbers 


Consider the sequence {X,, n = 1} of independent r.v.s where 


P[X, = 41) =1(1-27-),  P[Xn = 2"] =P[Xn = —27] = 27+, 


Obviously EX, = 0, = VX, = 1 — 277+ 2” so that Se, o,/n 
diverges. Thus the Kolmogorov condition, dina n/N” < © ig 
not satisfied. Nevertheless we shall show that {X,} obeys the 


SLLN. 
Recall that two sequences of r.v.s {¢,,} and {7,} are said to be 


equivalent in the sense of Khintchine 1f din=t Pion - ™) ee 
According to Révesz (1967) two such sequences simultaneously 
satisfy or do not satisfy the SLLN. 

Introduce the sequence {Y,, n => 1} where 


PIY, =1)=P\¥, =-1)=101-2-"), P[y¥, =0]=2-”. 


Clearly EX, = EY, and PLX, 4 Y,|=2~ for n €N. Since the series 
ae PIX, # Yn] = as 12" is convergent, the sequences 
{X,,} and {Y,} are equivalent in the sense of Khintchine. Further, 
VY, =1-—2™ so that din=1 VWYn/n” < ©. Thus the Kolmogorov 
condition is satisfied for the sequence {Y,} and therefore {Y,} 


obeys the SLLN. By the above result it follows that the sequence 
{X,,} also obeys the SLLN. 


Thus we have shown that the Kolmogorov condition for 


arbitrarily distributed r.v.s 1s not necessary for the validity of the 
SLLN. 


15.3. A sequence of independent discrete random variables 
satisfying the weak but not the strong law of large 
numbers 


Let {X,, 1 = 2! be independent r.v.s such that 
| . 
a f} ——— <i 
nlogn- 


Consider the events A, = {|X,|=>n},n = 2. Then 


P[X, =n] = P[X, = 0] =1- 


| 
™ 
a 


2n log n- 


P(An) = ——— + P(A) = 0. 


nlogn 


The divergence of the series ns P(An), the mutual 
independence of the variables X,, and the Borel-Cantelli lemma 


allow us to conclude that the event [A, 1.0.] has probability 1. In 
other words, 


S 
PA, 2 ia.) = 1S P| tim — £0] =, 


noo TL 


Therefore the sequence {X,, n > 2}' cannot satisfy the SLLN. 

Now we shall show that {X,,} obeys the WLLN. Obviously VX; 
=k/logk. Since the function x / log x has a local minimum at x =e 
and dur=3"/logk is a lower Riemann sum for the integral 


n+l eo Ee 
J, (a/ log x) da , we easily obtain that 


ra 


l 2 Rees Se 
— | —— — dr 
n? r +| logz- C 


2 (72—-2)(n+1 
| @=2(n +1) 


~ n?logn n? logn 


l Th 
—= SVX, 
Th 
k=2 


+0 asn—- oo. 


Thus the Markov condition for {X,,} 1s satisfied and therefore the 
sequence {X,!} obeys the WLLN. 


Finally, let us indicate another sequence whose properties are 
close to those of {X,,}. Let {Y,, n = 2} be a sequence of 1.1.d. r.v.s 


such that 


as =a 
G o it de 1 
P[Y2 = +n] = — ,n=2,3,...,C==( >) =——] . 
n* log n 2 n* log n 


n=2Z 


It can be shown that this sequence obeys the WLLN but does not 
satisfy the SLLN. The easiest way to do this is to use ch.f.s 
showing, for example, that y,(t) — 1 as n — oo where yw, is the 


deh of Oar or ee). 


15.4. A sequence of independent absolutely continuous random 
variables satisfying the weak but not the strong law of 
large numbers 

Let {X,, n = 1} be independent r.v.s where the density of X,, is 

given by 

] 
V20 n 
Then V.X,, = 77, and define 7n as follows: Tn = 2n7/(log n), n= 2. 


First we shall establish that {X,} does not obey the SLLN. 
Indeed, the probability of the event A, = {|X,,| =n} 1s 


exp(—V2|r|/o,), «2 €R'. 


fn(@) = 


—_ 


} 


2 ye I ef ee 

P(A,) = —— | exp(—V2r/o,) dx = exp -5v2 (log n/n | 
V/ 20 ndn “ 

Since (log n)7/n > 0 asn > ~, yin=2 P(An) = 00. Using similar 
reasoning to that in Example 15.3, we conclude that {X,,} does not 


obey the SLLN. 
Our purpose now is to show that {X,,} satisfies the WLLN. 


However, one can check that the Markov condition for {X,,} does 
not hold. Then the proof uses the Feller theorem cited in Example 


15.1. First, we can see that 
log k 
= exp (—ne r pS Zoe lll 


| | owe 
(1) P 1X Pe 


and clearly this probability can be made arbitrarily small for large 
Nn. 
For any truncation level c > 0 and ¢ > 0 we introduce the 


variables X& where Xk = Xj, if |X;| < c and Xk = c, if |Xj| > c. 
Using (1) we obtain 


SP Ea = : +0 asn— oo. 
ma «COL 

Similarly to Example 15.1 we can verify that 

— 2 VX; ~0 as n- oo. 


Thus, by the Feller theorem, the sequence {X,} satisfies the 
WLLN. 


15.5. The Kolmogorov condition yn=1In/ < is the best 
possible condition for the strong law of large numbers 


Let {X,, n > 1} be a sequence of independent r.v.s with finite 
variances Tn and {b,, n = 1} be a non-decreasing sequence of 
positive constants with b, — oo. We say that the sequence {X,,} 
obeys the SLLN with respect to {b,} if b, Sn — b,'ESp 40 ag 
n — co where S, =X] ++ + Xp. 

According to the Kolmogorov theorem the condition 
ain) on = © implies that {X,} satisfies the SLLN with 
respect to {b,}. Note that in the classical Kolmogorov theorem 5b, 
=n,n= 1. 

It 1s of general interest to understand the importance of the 
condition 22n—17n/0n < © in the SLLN. We shall now show 
that this condition is the best possible in the following sense. For 
simplicity we confine ourselves to the case b, =n, n = 1. So, let { 


% 
On} be a sequence of positive numbers with 


Gls > o2 /n? = ov. 


n=] 


We aim to construct a sequence {Y,, n = 1} of independent r.v.s 
with VY, = 77, such that {Y,,} does not satisfy the SLLN. Let us 
describe the sequence {Y,}. If Tr /n2 < 1 then the r.v. Y, takes the 
values (—), 0 and n with probabilities 7/(2n2), 1 — Tn/n2 and Tn 
(2n2) respectively. If %/n2 > 1 then Yq = +0, with probability 2 
each. ; 

Clearly EY, =0, VY, =n. For any ¢ > 0 we have 


ee eee 
if of /n- <1 


ene ae 
Bip freee) SB gz Ol 4 oy if o2/n? >I 


Suppose the sequence {Y,} does obey the SLLN. Then 


AL... 
necessarily Yn/n —+0 as n > , From (1) it 1s easy to derive that 
oxo Sa - _— : ; 
dna PllYn| > en] = 00, By the Borel-Cantelli lemma the events 
ALS. 
[|¥,,| > en] occur infinitely often, so the convergence ¥,/n—+00 


as n — co 1S not possible. 
Therefore {Y,} does not obey the SLLN. 


15.6. More on the strong law of large numbers without the 
Kolmogorov condition 


Consider the sequence {X,, n> 2} of independent r.v.s where 


(1) P[X, = +(n/logn)?/?] = 


tc | 
Ci 


It is easy to check that the Kolmogorov condition 
ynn2In/N” < & ig not satisfied. However, Example 15.2 
shows that the SLLN can also hold without this condition. In our 
specific case the most suitable result which can be applied is the 
following theorem (see Revesz 1967): let {¢,,} be independent r.v.s 


with Ec,, = 0 and let for some 7 > | 


00 
El|€n|°"] < co and » EllEn?"]/n"™* =< G6. 
n=l 
Then the sequence {c,,} satisfies the SLLN. 
Clearly for the sequence {X,,} defined by (1) it is sufficient to 


take r = 2 and verify directly the conditions in the Révész theorem. 
Thus we arrive at the conclusion that {X,,} obeys the SLLN. 


15.7. Two ‘near’ sequences of random variables such that the 
strong law of large numbers holds for one of them and 
does not hold for the other 


Consider two sequences of r.v.s, {X),,1 = 2} and {Y,,n = 2} where 


P|X,=n/ log n] = P|X, =—n/ log n] = logn/(2n), P[X, =0) = 1 — log n/n, 


PIY, = $n] = P[Y, =—n] = 1/(28?nlogn), P[Y,=0] = 1— 1/(6?nlogn) 


with 0 < #6 < 1. Obviously X,, and Y,, are symmetric r.v.s with 


Exe =EXo=0, VAn Hii. =n) logh 
and both satisfy the inequalities 


Leas eke, (eS ey FH Sas 
We are interested to know whether or not these sequences 
satisfy the SLLN. We shall show that {X,,! obeys the SLLN while 
{Y,,} does not. For this purpose we introduce H,. where 


pre 
i a ee Oo 
n=2" +] 
For any choice of ¢ > 0 we have exp(—s/H,.) < exp(-er log 2/2) 
implying 


be 
a exp(—e/H,) < oo. 
r=] 


However, this condition is sufficient to conclude that the sequence 
{X,,} obeys the SLLN (see Prohorov 1950). 


Suppose now that {Y,} also satisfies the SLLN. Then 
necessarily 


PlY,,/n — 0] = 1. 
It can easily be seen from the definition of {Y,,} that 


S- PY. |¢e= 8| = oe: 


n=2 


Then by the Borel-Cantelli lemma, the events [|Y,| = nf] occur 
infinitely often. This, however, contradicts the above relation, 


namely that Yn/”n —+0 0 as n —> oo, and therefore the sequence 
{Y,,} does not obey the SLLN. 


15.8. The law of large numbers does not hold if almost sure 
convergence is replaced by complete convergence 


Let {X,, n = 1} be a sequence of 1.1.d. r.v.s, F(x), x € re! their 


common df, and EX1 = f_., edF (2) = 0 Suppose that {Xj} 
satisfies the SLLN. Then 


1 + «.&. 
(1) a . (X,+-:-+X,)—30 as n- oo. 


It is natural to ask whether the conditions for the SLLN could 
guarantee that in (1) almost sure convergence can be replaced by a 
stronger kind of convergence, in particular by complete 
convergence (see Example 14.14). 

Under the conditions 


/ adF(x)=0, o7 = / a” d(x) < 00 
f—Oo = 


Hsu and Robbins (1947) have shown the convergence of the series 


yn=1 Pll¥n| > €] for any ¢ > 0. Therefore if condition (2) 1s 
satisfied, the sequence {Y,,$ converges completely. Thus instead of 


(1) we have Yn —> 0 as n 3-00. 
Suppose now that condition (2) is relaxed a little as follows: 


(3) |  adF io 0; | | el F(a) < coy | a dF = 


+4) wf — OND 


l oa t c 
where a = constant and 3(! + v5) Sa< 2 Then the sequence 
{X,,} satisfies the SLLN. However, the series dy n=1 PllYn| > €] 
diverges for every ¢ > 0 and hence the relation Yn —* 9 fails to 
hold. Therefore there are sequences of 1.1.d. r.v.s such that the 
corresponding arithmetic means {Y,} converge a.s. but not 
completely. 

Finally, it remains for us to indicate a particular case when 
conditions (3) are satisfied. For example, take X] to be absolutely 
continuous with density f{x) = x{> for |x| => 1 and ffx) = 0 
otherwise. 


15.9. The uniform boundedness of the first moments of a tight 
sequence of random variables is not sufficient for the 
strong law of large numbers 


Recall that the sequence {X,, n => 1} of real-valued r.v.s is said to 


be tight if for each ¢ > O there exists a compact interval Ke C R} 


such that PLX, € K,| > 1-—e for all n. 

Let {X,, n = 1} be a sequence of independent r.v.s. According to 
a result derived by Taylor and Wei (1979), if {X,} is a tight 
sequence and the rth moments with r > | are uniformly bounded 
(E[|X,,|"] <M = constant < 00) then {X,,} satisfies the SLLN. Is it 


then possible to weaken the assumption for r, r > 1, replacing it by 
r= 1 in the above result? 

By a specific example we show that the answer to this question 
is negative. Let {X,, n = 1} be a sequence of independent r.v.s 


such that 
P[X,, = +n] = $[nlog(n+2)|"', P[X, = 0] = 1 - [nlog(n + 2)|7'. 
Then EX, = 0, E[|X,|] = Vlog(n + 2). So E[|X,|] are uniformly 


bounded, and indeed, E[|X,,|] — 0. Taking into account the relation 
P[|X,| > 1] = 1An log(n + 2)], we conclude that the sequence {X,} 
is tight. 

Further, pel 1 Pl|Xn| 2 "] = © and the Borel-Cantelli lemma 
implies that the event [|X,| > 7 1.0.] has probability 1. However, 
this means that the SLLN cannot be valid for the sequence {X7}. 


15.10. The arithmetic means of a random sequence can 
converge in probability even if the strong law of large 
numbers fails to hold 

Let {X,, n > 1} be a sequence of 1.1.d. r.v.s such that E[[|X]|] = 

According to the Kolmogorov theorem this sequence does not 

satisfy the SLLN, 1.e. Y, = S,/n, where S, = X] +++ + + Xj, 1s not 

a.s. convergent as n — oo. However we can still ask about the 

convergence of Y,,, the arithmetic means, in a weaker sense, e.g. in 


probability. This possibility is considered in the next example. 
Consider the sequence {¢,, 1 = 1} of 1.1.d. r.v.s where 


ss 6 
P[é, = (-1)*-*4] = ee 


The divergence of the harmonic series implies that E[|¢]|] = ©. 
Hence {c,,} does not satisfy the SLLN. 
Let us show now that the arithmetic means (¢] +--+ + ¢,)/n 


fi |e Dawe 


converge in probability to a fixed number as n — oo. Our reasoning 

is based on the following general and very useful result (see e.g. 

Feller 1971 or Shiryaev 1995): if the ch.f. w(t) = E[e!S1y, ter! of 

E, is differentiable at t= 0 and w'(0) = ic, where? = V~—! andce 
P 

R! , then (1 + +--+ €,)/m—>C as n > o. Thus we first have to 

find the ch.f. y of the r.v. ¢) defined above. We have 


If we introduce the functions 


hy (1 -Sa-i [ul <1 and hat) = 0 op, [ul <1 


we sates find that they both are differentiable and 


Leal 
ee ; 
Hence /24(u) — ho(u) = (1/u) In. + &) which implies that y‘(0) 
exists and 


i | | | 
Ai (y) = a, log he (tu) = =F log(1 — wu”). 


6 2 
wo} = i [hi (1) —h,(1)| = i——. 


Thus we arrive at the final conclusion that 


] ' 
eee siete ae 


Note that in this case the sequence {c,,} satisfies the so-called 
generalized law of large numbers (see Example 15.12). 


15.11. The weighted averages of a sequence of random 
variables can converge even if the law of large numbers 
does not hold 


Let {X;, k= 1! be a sequence of non-degenerate 1.1.d. r.v.s, {cjz, k= 


1$ a sequence of positive numbers and let Sn = dipniCkXk and 
Cn = Vieni ck» ™ 2 1 The ratios S)/Cy, n 2 1 are called 
weighted averages generated by (Xz, cz, k = 1}. We say that the 
weak (strong) law holds for the weighted averages of {X;, cj, k = 


1} iff S,/C, converges in probability (a.s.) to some constant as n 
=> OO; 

Without any loss of generality we can suppose that EX; = 0 for 
all k => 1. We now want to see whether S,,/C;, converges to 0 as n 
— > 00; 

Obviously if all cz = 1 then S, =X, +--+ +X), C, =n and we 
are in the framework of the classical laws of large numbers. 

Our aim now is to show that there is a sequence of 1.1.d. r.v.s 
{X7,} and a sequence of weights {c,,$ such that 


ae 

a. 

In other words, the strong law holds for weighted averages but the 

classical SLLN is not valid. An analogous conclusion can be 

drawn about the weak law for weighted averages and the classical 
WLLN. 

Firstly, consider the strong law. By assumption the variables X,, 

are identically distributed and in this case the SLLN does not hold 

iff E[|X7|] = «©. Further, we need the following result (see Wright 


se 1 a.8. 
—+0 while —(X,+---+X,)~“0 as n->o. 
n 


et al): let g(x), x € R® be a non-negative measurable function with 
g(x) — 0 as x — o. Then there exists a sequence {Xz, cz, k = 1} 


whose weighted averages S,/C,,, n = 1, satisfy the strong law and 
Elg(X7 )] = Elg(X7 )] = 00. 

Actually this result contains all that we wanted, namely a 
sequence of 1.1.d. r.v.s {X7, k = 1} with E|X]| = co and a sequence 
of weights {c;z, k = 1} such that Sn/Cn 0 as n = 0, although 
the sequence {X;,} does not obey the classical SLLN. 


A similar conclusion can be obtained by using a result of Chow 
and Teicher (1971) which states that there is a r.v..X with E|X| = 00 
such that the sequence {X7} of independent copies of X together 


with a suitable sequence of weights {cz} generates weighted 


averages S,/C, which converge a.s. as n — oo. Obviously it is 
impossible in this case to take cy; = | since the classical SLLN for 
{X7,} 1s not satisfied. In this connection Chow and Teicher (1971) 
give two specific examples. The first one arises in the so-called St 
Petersburg game (X7, = 2* with probability 2k and O otherwise), 
while in the second case_X has a Cauchy distribution. 

It 1s of general interest to compare some consequences of the 


results cited above. In particular, let us look at the value of E[|X|"] 
for different r. Both examples considered by Chow and Teicher are 
such that 


lm. 2° PX) S| = 0 forall O<e< 1 
i Od 


which implies that E[|X|"] < oo for all 0 < r < 1. In the result of 
Wright et al (1977) we can take the function g(x) = (log x)* and 
choose a sequence {X;} of 1.1.d. r.v.s such that E[|X|"] = 0 for all r 
> 0 and find weights {cz} such that Sn/Can— constant. Clearly 
the SLLN fails to hold. 


Consider now the weak law. It is easy to see that if the weak law 
holds for the weighted averages of (Xz, cj} then {cj} must satisfy 


the condition 
tL) Cyrh7 0, C,/C, 70 as n- ov. 


According to Jamison et al (1965), the weak law holds for any 


sequence of weights {cz} satisfying (1) if Si ja|<T di(x) > 


constant as 7 — oo and 


XxX 


(2) jim PPX) 2) =p 


where F' is the d.f of X. This result and a statement by Loéve 
(1978) allow us to conclude that if _X has a fixed distribution (we 


consider only the case of 1.1.d.) then the weak law holds for (Xz, 
cj} for any {cj} 1ff {X7,} satisfies the classical WLLN (when all c; 
=I), 

However, using the result of Wright et al with g(x) = x" and 0 < 
r <1, one can obtain a sequence {X7, cj} for which the weak law 


holds but condition (2) is not satisfied. In such a case the weak law 
does not hold for the sequence {X;, 1}: Obviously this means that 


the sequence {X;} does not obey the classical WLLN in spite of 
the fact that for some weights {cz}, the weighted averages S,/C), 
converge in probability. 


15.12. The law of large numbers with a special choice of 
norming constants 

Let {X,, n = 1} be a sequence of independent r.v.s and S, =X] +: 

-- + X,. If for some number sequences {a,,n = 1} and {b,,n = 1}, 

with all 5, > 0, the following relation holds: 


(1) (S, —Gn)/bn 20 as noo 


and we say that {X,,} satisfies a generalized law of large numbers 


(LLN). This law is weak or strong depending on the type of 
convergence in (1). Ifa, = ES,, and 6, =n we obtain the scheme of 


the classical LLN. There are sequences of r.v.s for which the 
classical LLN does not hold, but for some choice of {a,} and {b,} 


the generalized LLN holds. Let us consider an example. 

In the well known St Petersburg game (also mentioned in 
Example 15.11), a player wins 2* roubles if heads first appears at 
the Ath toss of a symmetric coin, A = 1, 2, .... Thus we get a 
sequence of independent r.v.s {X;, k => 1} where PLX; = 2h) = 27k 
= 1 — PLX; = O]. It 1s easy to check that {X;} does not obey the 


WLLN. However, we can hope that a relation like (1) will hold. 
Using game terminology, suppose that a player pays variable 
entrance fees with a cumulative fee b, =n log n for the first n 


games. Then the game becomes ‘fair’ in the sense that 


(2) lim $;,,/b, = 1 in probability. 


TL OO 


It 1s natural to ask whether this game is ‘fair’ 1n a strong sense, 
that is, whether (2) is satisfied with probability 1. Actually we shall 
show that the St Petersburg game with b, =n log n is ‘fair’ in a 


weak but not in a strong sense. In other words, it will be shown 
that {X7,} obeys the weak but not the strong generalized LLN with 


Ay = b, =n logn, n= 2. 
= P 
The result that Sn/n —*1 as n — o is left to the reader as a 
useful exercise. Further, it is easy to see that PLX, > c] = I/c for 


any c > | and every n = 2. Hence for c = constant > 1 and n > 2 we 
have 


P/X, > cb,| > 1/(cb,) =1/(cnlogn) and >, PAG C0, | Se 


This and the Borel-Cantelli lemma imply that PLX,,/b, > c 1.0.] = 1. 
Thus 


PilimX,,/b, = co] =1 and PilimS,,/b, = oo] = 1. 


Therefore 
P| lim S,,/b, = 1] = 0 
Th OO 


showing that (2) is satisfied for convergence in probability but not 
a.s. 


SECTION 16. WEAK CONVERGENCE OF PROBABILITY 


MEASURES AND DISTRIBUTIONS 


In Section 14 we introduced the notion of convergence in 
distribution and illustrated it by examples. In particular, we 
mentioned that this kind of convergence is close to so-called weak 
convergence. In this section we define weak convergence and 
clarify its relationship with other kinds of convergence. 


Let F,,, n = 1, and F be d.f.s over the real line r!. Denote by P,, 
and P the probability measures over (R’, B*) generated by F, and 
F respectively. Recall that P, and P are determined uniquely by 
the relations P,(—o, x] = F,,(x) and P(-, x] = F(x), x € r!. Since 
F is continuous at the point x iff P({x}) = 0, then convergence in 
distribution f'n > F. means that P,(— ©, x] — P(-, x] for every 
x such that P({x}) = 0. Let us consider a more general situation. 


For any Borel set A in R! (that is A « Bl), OA will denote the 
boundary of A. Suppose P and P,, 1 = 1, are probability measures 


on (R’, B*). We say that the sequence {P,,} converges weakly to 
P and write P;, —> P, if for any A ¢ B! with P(GA) = 0 we have 


P,(A)} > P(A) as n- ov. 
Now we formulate the following fundamental result. 


Theorem 1. The following statements are equivalent: 


(b) (Jim, P(A) < P(A) for any closed set A € pl. 
(Cc) n—co - for any open set A € Bl. 


(d) For every continuous and bounded function g on r! we 


have 


/ g(z)P,(dz) > / g(z)P(dz) as no. 
RB! F R! 


Weak convergence can be studied in much more general 
situations not just for probability measures defined on the real line 
R!, However, convergence in distribution treated in Section 14 is 
equivalent to weak convergence discussed above. If we work with 
probability measures, the term weak convergence is preferable, 
while for d.f.s both terms, weak convergence and convergence in 
distribution, are used, as well as both notations, /n —+F and 
i 

We now formulate another fundamental result connecting the 
weak convergence of d.f.s with the pointwise convergence of the 
corresponding ch.f.s. 


Theorem 2. (Continuity theorem.) Let {Fy,, n = 1} be a 


sequence of d.f.s on r! and {Qy, n = 1} be the corresponding 
sequence of the ch f.s. 

(a) If Fn — F F for a df F, then 9,(t) — g(t), t € r! where @ is 
the chfi of F. 

(b)/f lim, , o @,(t) exists for each t ¢€ r! and g(t) := limp-> o 
g(t) is continuous at t = 0, then @ is the chfi of a df. F and 
Fn — FE as Nl —~ W, 


We refer the reader to the books by Billingsley (1968, 1995), 
Chung (1974) or Shiryaev (1995) for a detailed proof of Theorems 
1 and 2 and of several others. 

In this section we have included examples illustrating some 
aspects of the weak convergence of probability measures, 
distributions, and densities. 


16.1. Defining classes and classes defining convergence 


Let (Q, F) be a measurable space and P, Q probabilities on this 


space. The class of events A C F is said to be a defining class, if 


P=Q on AS} P=Q on GF. 
We say that A C F 1s aclass defining convergence if 


P,{(A) > P(A) forallsets AeA with P(OA)=0 
=> P,(A) > P(A) forallsets AEF with P(A) =0 
that is, that Pn, —> P as n = 0. 


Let us illustrate the relationship between these two notions. 


(i) Obviously every class defining convergence 1s a defining class. 
However, the converse 1s not always true. 
Let Q = [0,1), F¥ = Bro,1) and A ¢ & be the field of all finite 


sums of disjoint subintervals of the type [a, b) where O<a<b< 1. 
Then A 1s a defining class but not a class defining convergence. To 
see this it 1s enough to consider the probabilities P, and P 


concentrated at the points | — 1/n and 0 respectively. 
(ii) Let {P,,, n= 1}, P and Q be probabilities on (Q, F) where Q = 


Ri, F = B! and let AC F be a defining class. Suppose two 
conditions are satisfied: 


(1) P,,(A) —~ Q(A) as n-co forall AEA 
and 
(2) P,— >P as n- oo. 


Since A is a defining class, from (1) and (2) we could expect 
that P = Q. However, this is not the case. Define P,, P and Q as 


follows: 


It is easy to see that P,, — P as n — oo. Further, let B consist of 
the points 0, 1, n 5. 1 + n where n = 1. Denote by A the field 
containing all 4 < ¥ such that either AB is finite and 0 € A, or ACB 
is finite and 0 € AC. Then A is a defining class and P,,(A) > Q(A) 
as n — oo for every A € A. So (1) and (2) are satisfied, but P # Q. 


(iii) Let C[0,1] be the space of all continuous functions on [0,1] 
and € its Borel o-field. For k € N and f¢j,..., t, < [0,1] let 


Wty ...th ° C[0, 1| t—> IR* 

map the point (function) x « C[0,1] into the point (x(¢1),..., x(tk)) € 
RX. The ante Gs epaiena) sets (cylinders) in C[0,1] are defined as 
sets of the form “ nth H where H < B*. Denote by A the class of 
all such sets. non ihe o-field C is generated by A, A is a defining 
class. This leads to the following question: does A form a class 
defining convergence? The answer is positive if we consider the 
space (R”, B”) and the class A consisting of the finite- 
dimensional sets in R®™. 

However, as we shall show now, in the space C[0,1], A need not 
be a class defining convergence. To see this, consider the 
probability measures P and P, where P is concentrated on the 
function x = 0 (that 1s, x(t) = 0, t < [0,1]) and P,, is concentrated on 
the function x, defined by 


nt. if 27 = 
tr(t)=<2—nt, ifi<t<4 
rn 

Q), wa<t<l. 


ue 
Since x, does not converge to 0 uniformly in C[0,1], the 
measures Pn cannot converge weakly to P as n — oo. For —- 


if A = S(O, 2) j is the ball in C[0,1] with centre at 0 and reading , then 


P(A) = 0 but P, (A) =0 7 P(A) = 1. 

The relation P,,(4A) — P(A) holds for any finite-dimensional set 
A in C[0,1] with P(OA) = 0. This follows from the equality P,,(A) = 
P(A) which 1s satisfied for any A of the form Tt, oti H, He BK and 
n = ng where ng = [2/tmin] + 1 with tmin = min{h, t FO}. 

This example shows that weak convergence in the space C[0,1] 
cannot be characterized by convergence for all finite-dimensional 


sets (as in R®). 


16.2. In the case of convergence in distribution, do the 
corresponding probability measures converge for all 
Borel sets? 


Let Fo(x), F,(x),n = 1, be d.f.s and yo, u,, n = 1, their probability 
d 
measures on (R!, Bl), Suppose fn —> Fo? as n — 0. It follows 
that 
[n((—00, «]) + Ho((—-00, z}) 
for every x € R! which is a continuity point of 9. However this is 
a convergence of ju, to 49 but for a special kind of sets, namely for 


infinite intervals which of course belong to Bl. Thus we arrive at 
the following question. 


Is it true that F,, “> Fy imply pz,(B) — pio(B) for all B € B'? 
In fact, the negative answer to this question is contained in the 
definition of convergence in distribution. Perhaps the easiest 
illustration 1s to take Fy,(x) = 1p 1/n, woy(X), 2 2 1, and Fo(x) = 110, «) 
(x), x € R!. Then obviously F(x) — Fo(x) as n — oo for all x 
except the only point x = 0 where Fp has a jump (of size 1). Thus 


d 
in this completely degenerate case we obviously have /n —> Fo as 


n— o, Taking, for example, the Borel set (—oo, 0], we find 


Lin ((—00, 0]) = F,(0) = 0 A pto((—co, 0]) = Fo(0) = Las n - oo. 
In the above case the limiting function Fo is discontinuous. Let 
us assume now that Fo 1s continuous everywhere on r!. Of course, 


if Pn —, Fo and B is a Borel set with uo(OB) = 0, then u,(B) > 
Lo(B) as n — co. Let us illustrate what we expect if uo9(OB) F 0. 

Consider the r.v.s X9 and X,, n = 1, where Xo is uniformly 
distributed on the interval (0,1) and XX, is defined by 
PiLXn = = = = for k = 0,1,..., n — 1, n = 1 (uniform discrete 
distribution). If Fg and F;, are the d.f.s of Xq and X;,, respectively, 
we have (by [a] standing for the integer part of a) 


he TT a et} 0, if ah 
Fo(a2) = % 202s < 1 2a ed ' elit, im ee a 
Le. 4b ie oe, 1, i el, 
Since |[nx]/n — x| < 1/n for any x € r! and any n => 1 we conclude 
d d 
that Xn —? Xo as n — © (equivalently, that /n —> fo). Denote by 
Po and P,, the measures on kr! induced by Fo and F,, and let O be 


the set of all rational numbers in R!. Then P,(Q) = 1 for each n, 
Po(Q) = 0 and hence 
lim Poy =— 14 0—F;(0). 
mM 00 
In this example Pp (CQ) = 1, that is the crucial condition Pp (OB) = 
0 is not satisfied for B = QO. 
Note that the limiting function /’9 is not only continuous, it 1s 


absolutely continuous with a finite support (uniform distribution on 
(O,1)). A conclusion similar to the above concerning the eventual 
convergence of P, to Pg can also be derived for absolutely 


continuous fg having the whole real line R° as its support. 


Consider a sequence of independent Bernoulli r.v.s €), G9, ...: P[& 
=1]=p, P[¢;=0]=49, ¢q=1—- p,0 < p< 1. Denote by G,, the d.f. 
of the quantity Si, =(Sz;> np)|(npq)"! 2 where Sy ol + Cc, and 
let 6 be a r.v. distributed normally N(0,1). Then Sn +6 , Or 
equivalently, Gn “+6 as n — oo (Mis the standard normal d.f.). If 
Po and P,, are the measures on Rr! induced by ® and G,, and the 


Borel set B is defined by B =U, _ot{(k — np)/(npq)’/*}; then 
obviously 


P,(B) = P[S, € b] = 1 Po(B) = P[6 € B] =0asn—> ow. 


Once again this is due to the fact that the condition Po(CB) = 0 is 
not satisfied. 


16.3. Weak convergence of probability measures need not be 
uniform 


Let Fo(x), F,,(x), x € R!, n> 1 be dfs and nu, Lp, n > 1, their 


corresponding probability measures on (Ri, B}), Let us suppose 
that 
(1) lim u,(B) = uo(B) forall Bc B’. 
Th OO 

It is natural to ask if (1) holds uniformly in B. The example 
below shows that in general the answer is negative even for 
absolutely continuous d.f.s. Indeed, if F’9, Fy, 1 = 1, have densities 
fo, fy, 1 = 1, respectively, then (1) can be written in the form 


Ti— > OC 


(2) lim J fnlayax =f fo(x) ar Bes. 
B B 


Consider now the following functions: 


| l+sin(2rnav). if « € |0O,1 
pia) {sneer 


fl, if ee [0,1] 
~ 10, if x ¢ (0,1), fo() 0,1 


0, if « ¢Z [0,1]. 
It is easy to see that fo and f, for each n > | are density 
functions. Clearly fg is a uniform density on [0,1]. If #9 and F}, n 


d 
> 1, are the d.f.s of fo and f,, n = 1, then Fr — fo asn > o, 


Moreover, applying the Riemann-Lebesgue theorem (see Rudin 
1966; Royden 1968), we conclude that relation (2), and hence (1), 


is satisfied for this choice of fo, fy, 1 = 1, and for all Be Bl 
Consider now the sets B, = {x € [0,1]: f, (x )=1},n> 1. Then 


1 1 
o(ajdr = —. Sea Se Yee) Fs 
[i foe)dr=5. ff fuledr=5 +5, n=1,2 


Therefore in general the convergence 1n (1) and (2) can be non- 
uniform. 


16.4. Two cases when the continuity theorem is not valid 


Let Fo, F,,n = 1, be d.f.s with ch.f.s 99, 9,, n = 1, respectively. 
The continuity theorem states that as n — ©, 


FE is Fo <— > @n(t) > do(t) where @o is continuous at 0. 
Let us show that the continuity of go at 0 is essential. 
(i) Consider the sequence of r.v.s {X,, n = 1} where X,, ~ N(O, 7). 
Then the ch.f. gy of X, is given by bn(t) = exp(—5nt*),¢ e Rl, 
Obviously we have ?n (t) > () asn— where 


i 0, if ¢#0 
H(t) fs if t=0. 


Thus the limiting function ? is discontinuous at 0 and hence the 
continuity theorem does not hold. On the other hand, we have 


Plt Bik, | = Pin /2X, < neg) _ 6(n~\/22) = ; as 1. > 00. 

1 fy r Biiceess 1 . 
Clearly limnoo Fn(@) = F(2) exists for all x eR! but /'(*) = 3 is 
not a df. 


(ii) Consider the family of functions {F,, 1 = 1} where 


U, fer<—n 
1 ee (n+2a2)/(2n), if -n<a<n 
L. it e SH, 
Then for each n, F;, 1s a d.f. and clearly for all x <€ R! we have 
ot sae re, | 
liMn—oo fn(%) = 3: Thus the sequence {F,, } is convergent but its 


L 
limit, the constant 2, is not a d.f. A simple explanation of this fact 
can be given if we consider the ch.f. 9, of F,,. Since @, (t ) = (sin 


nt )/(nt ) then 


lim ¢,(t) — o(t) _ rt 7 aa 


Again, as in case (i), the limiting function ? is discontinuous at 0 
and therefore the continuity theorem cannot be applied. 
16.5. Weak convergence and Lévy metric 


For given two d.f.s F' (x ), X € R! andG (x) xeER ! the following 
quantity 
L(F,G) =inf{fe >0: F(x —¢) —e < G(x) < F(z +e)+e,2€R'} 


is called a Lévy metric (distance) between F' and G. Note that L(., -) 
is a metric in the space of all d.f.s and plays an essential role in 


probability theory; e.g. the following result is frequently used. Let 
F and F,, n= 1 be d.f.s. Then, as n — 


FPF, F <=> L(F,,F)- 0. 
Consider now the sequence {X,, n = 1} of independent r.v.s. 
Denotes, = X42 2G s, = VSn and let F, be the d.f. of S;, 


/s,. Suppose the variables X, are such that Fn —+G as n > o, 


where G is a d.f. (Actually, G belongs to the class of infinitely 
divisible distributions.) This is equivalent to saying that for any ¢ > 
0 there is an index 7, such that for all n > 4, we have L (F,,, G) < . 


Since the quantity L(/’,,, G) 1s ‘small’, we can suggest that another 
related quantity, L(Fr, Gn), is also ‘small’. Here Fp, is the d.f. of 
S,, (without normalization!) and @ n(x) = G(xSn). Tn several cases 


such a statement is true, but not always, as in the next example. 
Let XnjoJ =1,...,n,n= 1 be independent r.v.s where 


| 1 l - 1 , 
Pi l= 5 ( — 7 » PAS +nv5] =a = Licwnatlis 

If S,, =X, +“ + Xn, then ES, = 0 and 8n = VSn=VS,, = 5n2 
+ n- 1 —> © as n — o, For the normalized variable 
Mn = Sn/(nV5) we have Ey, = 0 and Vy, =1+(- 1)/(5n) 
implying that V7, — 1 asn — o. Let us find the limit of the d_f. 
Fy, (x) = Play, <x ] as n — o. In this case the best way is to find 
the ch.f. y, (¢ ) = Efe!/”” ]. By using the structure of the variables 
Xj; and the properties of ch.f. s we find that 

lim w,(t) = (t) = exp(cost — 1), tE R’. 

However y (t ) = exp(cost — 1) is a ch.f. corresponding to a 
concrete r.v., say 49 and yo = ¢] — €> with ¢; and €) independent 


i 
r.v.s each having a Poisson distribution with parameter 2. Hence by 


the continuity theorem, we have 
FE, -~Gasn73ow 


with G(x) = P[j79 <x |, x € RI, or equivalently lim,_, . L (Fy, G ) 
= 0. 

Thus the quantity LZ (f,,, G ) is ‘small’ and we want to see if 
L(Fn,Gn)> is also ‘small’. Recall that /’n is the df. of S,, itself, 
while Gn(x) = G(a8n). Note first that Fn and Fh correspond to 
discrete r.v.s. Specifically, the values of S, are in the set 
{+j,tkV5+1:9,k,t = I, el 7} and the d.f. F'n has jumps at all 
points of this set. Further, @ n() = Pon < 7], where Gn = No-nV5 
and it is obvious that Cy takes its values in the set 
{0,th-nV5 :k = 1,2,...} at each point of which “n has a jump. 
In particular, P[79 = 0] > 0 which implies that for an odd index n 
we can find a number c > 0 (expressed through P[79 = 0]) such 


that 1 Pilorye) > c. Hence we conclude that in this case the 


ane 


quantity L( Fn, Gn). is not small. 


16.6. A sequence of probability density functions can converge 
in the mean of order 1 without being converging 
everywhere 


Let fo(x ), f(x ), fo(x ),...,X € R! be probability density functions. 
Here we consider two kinds of convergence of f, to /o: 


convergence almost everywhere and convergence in the mean of 
order | which are expressed respectively by 


(1) lim f(x) = fo(2) almost everywhere 
L— 00 


T 


Th OO 


(2) fim, | |fal) — fo(x)| de = 0. 


Let us compare (1) and (2). According to a result by Robbins 
(1948), (1)=(2). However, the converse 1s not always true. Indeed, 
let 


f(x) = n[f(n—1), if (k-1)/n<a2<k/n-1/n*,k =1,2,.. 
aT 0, otherwise 
and let fg be the uniform density on the interval (0,1). It is easy to 
see that for every n, f, 1s a density and if B, = {x € (0,1): f, (x ) > 
0}, then 
fn(z) — folz)| = 4 5! NEE Br 
a tee if « € BSN (0,1). 

Since the sets B, and ms ' have Lebesgue measures (n — 1)/n 
and 1/n respectively, we obtain the relation 


| 
J inte) - fo(: r)| dx = = oS im f fn(a) — fo(x)| da = 0 
0 Th rh 


that 1s, f, converges to fg in the mean of order |. It now remains to 
show that 


fn(x) A fola) =1, x € (0,1). 
For any fixed irrational number z there exist infinitely many 


rational numbers / /A such that 7 /A — 1/k t<z7< j /k. This fact and 
the definition of f, imply that f, (x) = 0 for infinitely many n and 
for any fixed irrational x © (0,1). Furthermore, if x is a rational 
number in (0,1), then x = 7 /k for some positive integers j and k 
with j < k, and moreover 


fea) tor w= 1h, dL ces. 


Thus for any x € (0,1) the densities f,, (x ) cannot converge to fo(x ) 
=], 


16.7. A version of the continuity theorem for distribution 
functions which does not hold for some densities 
Let X, be ar.v. with df. F,, density f, and ch.f. gy, n = 1. The 


continuity theorem provides necessary and sufficient conditions for 
the weak convergence of {F’,  1n terms of {g, } Now we want to 


find conditions which relate the ch.f.s {g, } and the densities (fy, } 
For some r.v. Xg with d.f. fo, density fo and ch.f. @9 we 
introduce the following three conditions: 


(1) lim f,(x) = fo(x) for almost all x € R’, 
Tl OO 
(2) F.,—> Fy as n — 00, 
(3) lim @n(t) = ¢o0(t) for all t € R' and gp is continuous at 0. 
Th? GO 


By the continuity theorem we have (2) == (3). According to 
the Scheffe theorem (see Example 14.9), (1) = (2). Example 14.9 
also shows that in general (2) ad (1). 

Thus we conclude that (1) = (3) and can expect that in general (3) 
fo (1). Let us illustrate by an example that indeed (3) - (1). 

Consider the standard normal density 

p(x) = (2)~/? exp(—427) and its chf¢ ott) = exp(—st*). 


Define the functions 
(A) f(z) = v(x) — cosAx)/(1 — @o(A)), «2 ER’, 


(5) w(t) = [2d0(t) — do(t + A) — go(t — A)]/[2(1 — d0(A))], tER’ 


where A is any real number (e.g. take A = 7). It is not difficult to 
check that for each A, f(x ), x € R! is a probability density 


function, yw,{t ), t€ R! isa ch.f., and moreover, y, corresponds to 
fy, . Further, we find 


(6) jim w(t) = $o(t) = exp(—4t?) for allt € R’ 


where the limiting function @p is continuous at 0 and thus (3) is 


satisfied. 
However, 
(7) lim | fy(a) Z v(x) = (20)7 1? exp(—32°) 


and hence condition (1) does not hold. 
Comparing (6) and (7) we see that in general the pointwise 
convergence of the ch.f.s g, given by (3) is not enough to ensure 


the convergence (1) of the densities f,. At this point the following 


result may be useful (see Feller 1971). 
Let g, and g be absolutely integrable ch.f.s such that 


N00 


(3) lim / ~ Idn(t) — 6(£)| dt = 0. 


Then the corresponding d.f.s F,, and F' have bounded continuous 
densities f,, and frespectively, and (8) implies that 


(9) lim fn(x) = f(x) uniformly in x, x € R’. 


moo 


Obviously, in the above specific example, condition (9) is not 
satisfied (see (7)). It is easy to see that the pointwise convergence 
given by (6) does not imply the integral convergence (8). 


16.8. Weak convergence of distribution functions does not 
imply convergence of the moments 


om?) 
Let F and F,, n > 1 be d.f.s. Denote by mz and '"k their kth 
moments: 


mE = [ az” dF (zx), mi” = [ a*dF,(c), k=1,2,.... 
—00 6a OO 

According to the Fréchet-Shohat theorem (see Feller 1971), if 

(n) 


ye —+» mp aS n — 0 for all k and the moment sequence {m; } 


determines / uniquely, then 
(1) FP, F asn + ov. 


(For such results also see the works of Kendall and Rao 1950, 
Lukacs 1970.) 
Now let us answer the converse question: does the weak 


: et) 
convergence (1) imply convergence of the moments ! “ tom, ? 


By two examples we show that (1) can hold even if "’ ‘ Pm as 
n — oo for any k. 


(i) Consider the family of d.f.s {F,, 1 = 1} where 


| ad 1 , 
Foe) = ( ~ 7) , en" /2 du + a Telit ity, Wie R’. 
It is easy to see that 
lim F,(x) = ®(x) forall z € R’ 


where ©® is the standard — d.f., that is iH, —+® as n — o, 


However, the moments my : of any order k of F,, tend to infinity 


ooilth) — 
asin — ooand hence’: cannot converge to the moments m; of N 


(0,1). Recall that here m9;—1 = 0, mo; = (2k — 1)! k= 1, 2..... 
(ii) Let F,, be the d.f. of a r.v. X, distributed uniformly on the 


interval [0, ” | and Fo be the d.f. of a degenerate r.v. Xo, for 
example, Xq = 0. Define 


| or ii 
Cn) = ~ Fn (2) + (1 — *) Fo(x), x € Ri. n> 1. 


Then {G,, n = 1} is a sequence of d.f.s. The limit behaviour of 
{G, } can easily be investigated in terms of the corresponding 
ch.f.s {Wy }. Since 


‘OO - . l ytin x l 
Un (t) = / gen dG, (2) — — e ae (1 Ls = 


fore 1 tn n 


we find that limp. ow ijt) =I1,te R! which implies that 

Jim G,(2) = Foe) 
for all x eR! except x = 0 (the value of Xo; the only point of jump 
of Fo) 


on i) 
It remains for us to clarify whether the moments '": of G,, 
converge to the m; of Fo. We have 


(n) in k nk 
no = x” dGy(xz) = —— > ow as nN CO 
: k+l 
— Lx) i 


for every k, k= 1, 2,..., while the moments m;, of Fo are all zero. 


16.9. Weak convergence of a sequence of distributions does not 
always imply the convergence of the moment generating 
functions 


Recall first a version of the continuity theorem. Suppose {F),, n= 
1,2,...} are d.f.s and {M,, n = 1,2, ...}, the corresponding m.g.f.s 
M,, (z ) exist for all |z | <7ro and all n. If F and M 1s another pair of 
a d.f. and m.g.f. such that M, (z ) — M (z )asn — o for all |z |< 
r) where rj <ro, then /’n — F. 

Thus under general conditions the convergence of the m.g.f.s 
implies the weak convergence of the corresponding d.f.s and this 
motivates us to ask the inverse question: if /n — F , does it 
follow that ,, — Mas n — o? 

Intuitively we may guess that the answer is ‘no’ rather than 
‘yes’, simply because when talking about a m.g.f. we assume at 
least the existence of moments of any order. The latter is not 
necessary for the weak distribution. 

A simple example shows that the answer to the above question 
is negative. Consider the d.f.s F and F,,, n = 1,2, ..., defined by 


0 if 2r<0 0, it 2a 
ha) = 1 se F(z) =¢ 5+c,arctan(nz), if -n<a<n 
, if 220, i fa2>n 


where c, = 1/ [2arctan(n)]. 
It is easy to check that F, (x ) ~ F (x ) asn — ~~ at all points of 


continuity of F. Hence fn —>F. Since F is a degenerate 
distribution concentrated at 0, then its m.g.f. MW, (z ) = 1 for all z. 


Further, the m.g.f. VM, (z ) of F,, 
" n 
M,, ee oy 3 1: 
(2) [ ‘ Le ae ne 
exists for all z. It 1s almost obvious that M, (z ) — M (z ) asn — 00 


only for z= 0. Ifz £0, Mn(z) # M(2) as n > since IM), (Zz )| > 


COasn a ©, 


16.10. Weak convergence of a sequence of distribution 
functions does not always imply their convergence in the 
mean 

Let Fo, F1, Fo, ... be d.f.s. Suppose for some f/ > 0 the following 

relation holds: 


(1) lim /  1FA(2) — Fo(z) |? dz = 0. 


noo 


From here it is easy to derive that /n — F asn> 
Now let us analyse (1) but in the opposite direction. Firstly, 


w 

suppose that /n—* fo. The question is, under what additional 
assumptions we can obtain a relation like (1) with a suitable 6 > 0? 
One possible answer is contained in the following result (see 


Laube 1973). If /n —+ Fo and for some y>0, 


(2) sup / | | die Cares oe 
n=l —oo 
then F’,, tends to Fo in the mean of order / > 1/y, that 1s (2) and the 
weak convergence of F’,, to #9 imply (1) with f > 1/y. 
Our aim now is to show that (1) need not be true if we take / = 


1/y. To see this, consider the following d.f.s: 
Fo(z) = 11o,00)(x), z € R’, 


1 | | | 
F(z) = ~ 1 [-n,0)(2) + lipas te), 2 R',n=1,2,.... 


Then it can be easily seen that 


/ | ea (oe L, jim | F,(£) = 1po,0)(@) = Fo(@), x € R’. 


(i 


Obviously condition (2) is valid for y = 1. However, relation (1) 


does not hold for / = 1, that 1s, for 6 = 1/y, since 


| | |F,(2) — Fo(x)| da = 1 for all n. 


Finally, note that relations like (1) can be used to obtain 
estimates for the global convergence behaviour in the central limit 
theorem (CLT) (see Laube 1973). 


SECTION 17. CENTRAL LIMIT THEOREM 
Let {X,, n = 1} be a sequence of independent r.v.s defined on the 
probability space (Q, F, P). As usual, denote 
oH = X41 “pes Ais lh = EXz,, Ay = ES,, = 81 Teer ns 
BV Xie, s° = VS, =o74+---+0%. 

We say that the sequence {X, } satisfies the central limit 
theorem (CLT) (or, that {X,, } obeys the CLT) if 


lim P\(S, — An)/s, < 2] = P(x e~“/2 dy for all x € R¢. 
Th OO 


= | e 
Let F; denote the d.f. of X;. Clearly, we can suppose that EX; = 
0 for all A => 1. Now introduce the following three conditions: 


Th 


Le lim — u? dF;.(u) = 0 for each « > 0 
qs ke . 
Te OO ‘a lu|>es i. 


mh e-1 “| 
(Lindeberg condition ); 


; a; 
(F) lim max + ==() 
: noo l<k<=n § = 


(Feller condition ); 


(UAN) lim a Pi Xu Sel =O where Xeug = Xn /Sn- 
noo lik 
(uniform asymptotic negligibility condition (u.a.n. condition)). 
Now we shall formulate in a compact form two fundamental 
results. 
Lindeberg theorem. 


(L) = (CLT) 
Lindeberg-Feller theorem. If (F), then 
(hy 4 (Gb) 
or if (UAN), then 
(Li). <> {CLT}. 


The proof of these theorems and several other related topics can 
be found in many books. We refer the reader to the books by 
Gnedenko (1962), Fisz (1963), Breiman (1968), Billingsley (1968, 
1995), Thomasian (1969), Rényi (1970), Feller (1971), Ash 
(1972), Chung (1974), Chow and Teicher (1978), Loéve (1978), 
Laha and Rohatgi (1979) and Shiryaev (1995). 

The examples below demonstrate the range of validity of the 
CLT and examine the importance of the conditions under which 
the CLT does hold. Some related questions are also considered. 


17.1. Sequences of random variables which do not satisfy the 
central limit theorem 


(i) Let Xj, X9, ... be independent r.v.s defined as follows: PLY] = 


Ll 
+]]=2 and fork >2 and some c, 0<c < 1, 


| | 
PIX, = +1] = 5(1—¢), P[Xe = +h] = 56, PIX = 0] = (1 - 4 c. 


First let us check if the Lindeberg condition is satisfied. We 


have 


zm sx [Xx] > evn)] 


9 
S7 Piling 
wh lee el 


If 1 1s large enough and such that © fi > 1, ¢>0 is fixed, then we 
find 


. = k?P[|X,| = k] & ~(n—eYnje+e> 0 
ce [evn] 
Therefore the given sequence {X; } does not satisfy the 


Lindeberg condition. However, this does not mean that the CLT 
fails to hold for the sequence {X; } because the Lindeberg 


condition is only a sufficient condition. Actually the sequence {X; 
‘ does not obey the CLT. This follows from the fact that X; /s, 
satisfy the u.a.n. condition. Indeed, 


if k<e./n 
fs. Soe e& aii 
P|| Xx /sn| > €] = Pl|Xe| > vn] = 1° iC, if h > es/n. 
Thus 


| 
a 


max Pi)<K/ sn >el< n+ oo. 

Now °n/8n — € where é ~ N(0,1); this and the u.a.n. condition 
would imply the Lindeberg condition which, as we have seen 
above, 1s not satisfied. 

Thus our final conclusion is that the Lindeberg condition is not 
satisfied and the CLT does not hold. 


L 
(ii) Let the r.v. Y take two values, 1 and —1, with probability 2 
each, and let {Y;, 4 = 1} be a sequence of independent copies of Y. 


Define a new sequence {X;, k => 1} where Xp = V15Y«/4" and let 


S, =X, ++ +X). Since EY = 0 and VX = | we easily find that 


ES, = 0 and e = VS, =1- (+)" ; 
Thus 8n © 1 for large n (this is why the factor V 1o was involved). 
On the other hand it is obvious that 
P||S,,| < 5] =0 forevery n> 1. 


Therefore the probabilities P[S,, < x ] cannot converge to the 
standard normal d.f. ®(x ) for all x, so the sequence {X; } does not 
obey the CLT. Note that in this example X] ‘dominates’ the other 


terms. 
(iii) Suppose that for each n, 
Sn = Ani mars © "Ve aa) aia Ann 

where X,1],..., Xn, are independent r.v.s and each has a Poisson 
distribution with mean I/(2n ). We could expect that the 
distribution of the normalized quantity (Sp — ES;)/VVSn will 
tend to the standard normal d.f. ®. However, this is not the case, in 
spite of the fact that 

Pi Xn 0] = a iG") cad for large n 
that is each X,7, 1s ‘almost’ zero. It is enough to note that for each n 
the sum 5S, has a Poisson distribution with parameter 2. In 
particular, P[S, = 0] = e 1/2 implying that the distribution of 
(S;, — ES;,,)/VVSn cannot be close to ®. 


17.2. How is the central limit theorem connected with the 
Feller condition and the uniform negligibility condition? 


Let {X,, n = 1} be a sequence of independent r.v.s such that 


Xn ~ N(0.07) where 71 = land %% = 2° ~ for k>2. Then Sn 
=X1+... +X, has variance sy = 2". Since AK /SKp~ N(O, 2) we 
find that 

Sn/8n ~ N(O, 1) for each n 
and therefore the CLT for {X; } is satisfied trivially. Further, 


2 yn—2 1 
‘oy . _ = 
lim max —= lim —— =-— = {) 
noo l<k<en Sy, noo Qn—-l 2 
and moreover 
Pl|Xp|/8n > €] > Pi|Xn|/sn > €] = 1 Lf lu > O 
max /s Ee 2 Se) = 1 — e du 
1<k<n il ie fT Joe 


Hence neither the Feller condition nor the u.a.n. condition holds. 
This implies that the Lindeberg condition also does not hold. 
However, despite these facts the sequence {X,, } obeys the CLT. 


17.3. Two ‘equivalent’ sequences of random variables such that 
one of them obeys the central limit theorem while the 
other does not 


Consider again the sequence of independent r.v.s {X,, 1 > 1} from 


1 
Example 17.1: namely, PLY) =+1] =2 and fork >2 and0O<c< 1, 


=a" P(X, = 0] = (1 - a)e 


Using trancation we define the sequence 
0 ee oe ee Ct 1} by 


| 
BAe Seal| = 51 9G), Beak) = 


Denote Sn = Xnit-+++ Xan, 2 = VSn. Since VXna = Lif 
Pawn and VSS Sas. we- Fad tai 
2 = [Vn] + (1 —0)(n— [Va]) © (1 — 0) and thus 


(/n — €/n(1 —c))e > 0. 


a wa 1-0) 


Therefore the Lindeberg condition holds and Sn/ Sn _ 'l where 
7 1s distributed N(0,1). So the sequence {Xnk } obeys the CLT. 

We shall show that the sequences {S,, } and {Sn} (not {X, } and 
{X,; }) are ‘equivalent’ in the following sense: 


(1) P[S, £4 S,] 90 as noo. 


Indeed, 
Psa ae [Xk # Xnk 


PX > vi mn < >) Pl|Xe| = Al. 


* 
=~ 
lI 

3 


Therefore 


P(|Xx| = k] = = ( - =| and 2 = < 00 > P[S, # Sp] + 0.as n > 00. 
However (see Example 17.1) the sequence {X,, } does not obey 
the CLT. ; 
Thus we have constructed two sequences, {X, } and {Xnk f, 
which are equivalent in the sense of (1) and such that the CLT 
holds for {X,; } but does not hold for {X, }. Note again that the 


Lindeberg condition is valid for {Xnk} but not for {X;, }. 


17.4. If the sequence of random variables {X,, } satisfies the 
central limit theorem, what can we say about the 
variance of 9n/VVSn? 

Consider two sequences, {X7, k = 1} and {Y¥;, k = 1}, each 

consisting of independent r.v.s and such that 


BXe Sel Seek), PX Se) ae Py | Se. 


Denote 


ad 


8S,=Mi4+-:-+Y,, %S, = Xp t+: +X. 


~ d 
Obviously the sequence {Y,, } obeys the CLT: that is, Sn/V/n—r€ 


where ¢ ~ N(0,1). The truncation principle (see Gnedenko 1962; 
Feller 1971), when applied to the sequence {X; }, shows that 


S/N has the same asymptotic behaviour as that of Sin/Vn- Thus 
os d : 
we conclude that Sn/V/" — as n — o where yn ~ N(O,1). 
Then we can expect intuitively that 
ViS,/Vn] 31 and V[S,//n] 71 asn— oo. 
For the sequence { Y; } we have EY; =0, VY; = 1. Thus for each 
nN, 
1=V[S,//n] + lasn—- oo. 
On the other hand, for {X; } we find EX; = 0, VX, =2 - Wk ? 
and 


Th 


: 1 1 iol 
Visn/val == 9° (2- GG) =2-2 72 7? 2asn > 00 
2 “k=1 


ema | 


(since iil) k*) < 00), that is VISn/Vn| * Las we assumed. 
Therefore the CLT does not ensure in general the convergence 


of the moments of the normed sum Sn/V” to the moments of the 
normal distribution (0,1). For the convergence of the moments 
we need some additional integrability conditions. In particular, 


E||S,//n|?t°] < 00, 6>0 > V[S,/Vn] > VE. 


17.5. Not every interval can be a domain of normal 
convergence 


Suppose {X,,, 1 = 1} 1s a sequence of 1.1.d. r.v.s which satisfies the 
CLT. Denote by F,, the d.f. of {S,, — ES), )/(VS;,, yl 2 where Sy = 


X1, + ++ X,. The uniform convergence F), (x ) > O(x ), x € R! 


implies 
= Be a 
lim ifn) = 
(1) "700 1 O(z) uniformly in x on any finite interval of R! 


Note that (1) will hold uniformly on intervals of the type [0, 5, | 
whose length b, increases with n. In general, intervals for which 
(1) holds are called domains of normal convergence. Obviously 
such intervals exist, but we now show that not every interval can 
be a domain of normal convergence. 

Consider X], X2, ... to be independent Bernoulli r.v.s with 
parameter p: that is, PLX) = 1] =p =1—- PLXy = 0]. Obviously the 
sequence {X, } obeys the CLT. If S, =X] + -:: X,, then ES, = np, 
so = V5, =p —®) and 


P (or ig Me bs X}. — ») > : 


k=1 


| — fn (xr) 


= 


J > X, > #(np(1 — p))'/? + np 
al | 


Hence for an arbitrary x > (n (l — p )/p yl 2 we obtain the equality 


11 — F,(%)|/[1 — ®(2)| = 0 
which clearly contradicts (1). Therefore (1) cannot hold for any 
interval of the type (0, O( /)}. Tn particular the interval 0, py, 
where cp > (1 — p )ip yl 2 (p is fixed), cannot be a domain of 
normal convergence. 
Finally note that intervals of the type 0, 0(/7)] are domains of 
normal convergence. This follows from the well known Berry- 


Esseen estimates in the CLT (see Feller 1971; Chow and Teicher 
1978; Shiryaev 1995). 


17.6. The central limit theorem does not always hold for 
random sums of random variables 

Let {X,, n = 1} be a sequence of r.v.s which satisfies the CLT. 
Take another sequence {v,, n > 1} of integer-valued r.v.s such that 
Vn —> Oo as n — co and define Th = Sy = X1 + + Xy and 
2 = VI nF 

lim P[(T, —ET,)/b, <2] = (xz), xeE€R 

Th OO 
we say that the CLT holds for the random sums {Sy, } generated 
by {Xp $ and {vp §. 

In the next two examples we show that the CLT does not always 

hold for {S\, }. 


In both cases {X,, n = 1} 1s a sequence of 1.1.d. r.v.s such that P 
1 
[X7 = +1] = 2. Obviously if v, =n as. for each n, then 7, = S, = 
aos ae d ; 
X, + +X,, b% =nand Ln/bn — € where & ~ N(0,1). 


(i) Define the sequence {v,, 1m > 0} as follows: 


y= 0 aid m mink >. 7 25, — (—1)" } forn > 1. 


Then Yn —> Oasn = 0, O% = Vin = n* — =VTy =n 2 and clearly 


P[T,/by = (—1)"] = 1. 
It follows that the distribution of 7;, /b, does not have a limit as n 
— o and hence the CLT cannot be valid for the random sums {Sy 


ie 
(ii) Let {v,, n = 1} be independent r.v.s such that v, takes the 


values n and 2n, with probabilities p and g = | — p respectively. 
Suppose additionally that {v, } is independent of {X, } Then 


b, = VI, = pE[S;,] + GE[S2,] = (1 + g)n. 

It is easy to check that 7,, /b, does not converge in distribution 
to ar.v.¢ ~ N(0,1). More precisely, P[7,, /b, <x | converges to the 
mixture of the distributions of two r.v.s, €] ~ N(O, (1 + q y 2) and 
62 ~N(O, 201 + g ) 2) with weights p and g respectively. 


17.7. Sequences of random variables which satisfy the integral 
but not the local central limit theorem 


Let {X,, n = 1} be a sequence of independent r.v.s. Denote by F;, 
and f,, respectively the d.f. ane the density of (S,, — ES), )/s, where 


as usual S, = Xy1 ++" +X), 2n = VS 
Let us set down the following relations: 


(1) lim F,(2) = ®(x) = oa =f e“/2du, x eR! 
Th OO AT 
(2) hm jf) = oe) = (Qn)~M2_-2"/2, zt €R'. 
n+ 00 


Recall that if (1) holds we say that the sequence {X;,, } obeys the 


integral CLT, while in case (2) we say that {X, } obeys the local 


CLT (for the densities). It is easy to see that (2) = (1). However, in 
general weak convergence does not imply convergence of the 
corresponding densities (see Example 14.9). Note that in (1) and 
(2) the limit distribution is (0,1). Question: is the implication (1) 
= (2) true? Two examples will be considered where (1) f (2). In 
the first example the variables are identically distributed while in 
the second they have different distributions. 


(i) Let_X be ar.v. with density 


3) fe)=4 or 9 Ele e” 
i 1/(2|a| log |x|), if |z| <e7* with f(0) =c,0<c< oo. 


The density fis unbounded around x = 0, however, since_X is a 
bounded r.v., the sequence {X,, n => 1} of independent copies of X 


satisfies the (integral) CLT. So the aim is to study the limit 
behaviour of the density f, of (Xi +:+-+Xn)/(oVn) where 


SV = i Cal log’ x) da, p26 <a. 
If go is the density of the sum X] + _X5 then go is expressed by 
the convolution 


1 


g2(x) = / | f(u) f(a — u) du. 


~e-1 
Let us now try to find a lower bound for go. It is enough to 
consider x in a neighbourhood of 0; in particular we can assume 
that |x | < e | and, even more, that 0 < x < e!. Then 
g2(«) 2 J Pwmfe = wd. cic. f (x — u ) reaches its 
minimum in the domain |u | < x at u = 0, we have 


] ° i 
He) A / ——,__ du = ——_,—. 
227 log* x J—»x 2|u| log® |u| 22x| log” x| 


Analogously we establish that in a neighbourhood of O the 
density 23 of the sum_X] + X5 +_X3 satisfies the inequality 


C3 


g3(x) > cz; = constant > 0. 


rlog* x 
In general, if g, is the density of X] + ... + X, we find that 
around 0, 


C Th 


ie x| : 


7 c.. =constant.o ul, 
x| log 


gn(x) > 


Thus for each n, g, (x) takes an infinite value at x = 0. Since f, 1s 
obtained from g,, by suitable norming, then /, (x ) cannot converge 
to g (x)asn —> ©, 

Therefore the sequence {X,, } defined by the density (3) does not 
obey the local CLT although the integral CLT holds. 


(ii) Let LX, n = 1} be independent r.v.s where _X,, has density 


o. Tae see oh?" S a) el 
Q, otherwise. 


(4) f,(2) = 


1 
It is easy to see that EX, = 0, VX, = 2+ 5/(3.22* +7), Then for 


1 
an arbitrary k > 1, 2 < VX; < 1, the Lindeberg condition is satisfied 
and hence the sequence {X,, } obeys the (integral) CLT. 


Denote by g; (x ), x € R! the density of the sum S; = X] + °° + 
Xj, . Then for 4 = 2, go 1s the convolution of f] and /9, that is 


gale) = (fi fa)(0) =f filw)fa(e—u) du 


1 
Let us find the value of g9(x ) at the point x = 2. By (4) we have 


o 2 <lul<a 


fi(u) #0, if -—Z<u = =, 
fo(5 —u) #0 if esuss or e<lu<Z 


1 
Comparing the intervals where fj # 0 and fo # 0 we see that g9(2) = 
0. Analogously we find that 


9 (4) =(92* fa)(olexy = f galu)fale—wdu)  =0 


1 
and, more generally, that g, (2) = 0 for all n => 2. It is not difficult 
1 
to see that g, (x ) = 0 for all x of the form x = 2(2m + 1), m= 0, +1, 
Ll 
+2,... and finally that g, (x ) = 0 for all x = 2(2m + 1) + 6 where m 


Bf g08 
=f 0 i= i= ye and |0| < 4: 
The sum S, = X]+ «+ +X, has ES, = O and 
ee. ee —2n\) _. 
VSn = 8 = 2+ 7753 (1-27°"). Since the density g, of S, and 
the density p, of S,, /s, satisfy the relation p, (x ) = SyZp (XS, ), We 
have to study the behaviour of the quantity s,g, (xs, ) as n — ©. 


l 
Again, take x = 2. Then 


SnJn (5 $n) = [pnt A 7 ais | - Yn (5 [pn isn ( 7 22m) 7 


Ifnis of the form n = 2{2N + 1, then the argument of g, becomes 


L(2N + 1)[1 + -o5(1 — 27-7 2@N+D")(2N + :1)-?]. 


L 
For nize N this expression takes the form 2(2N + 1) + 0 with 


0| < 7. From the properties of g, established above we conclude 
that 


$n9n (38n) =0 for sufficiently large n. 


This implies that 


However, (3) * 0 and thus relation (2) is not possible. 
Therefore the sequence {X,, } defined by the densities (4) does not 
obey the local CLT. 

General conditions ensuring convergence of both the d.f.s and 
the densities are described by Gnedenko and Kolmogorov (1954). 


SECTION 18. DIVERSE LIMIT THEOREMS 


In this section we have collected examples dealing with different 
kinds of limit behaviour of random sequences. The examples 
concern random series, conditional expectations, records and 
maxima of random sequences, versions of the law of the iterated 
logarithm and net convergence. The definitions of some of the 
notions are given in the examples themselves. For convenience we 
formulate one result here and give one definition. 


Kolmogorov three-series theorem. Let {X,,, 1 = 1} be a sequence 


re) __ 
of independent r.v.s and Xn = Anlyx,,.|\<c for some c > 0. A 
Oo 
necessary condition for the convergence of a n=1 Xn with 
probability | is that the series 


DEX), DOVIXW), Dd PIXn| > 

n=l n=l n=1 

converge for every c > 0. A sufficient condition is that these series 
are convergent for some c > 0. 


The proof of this theorem and some useful corollaries can be 
found in the books by Breiman (1968), Chow and Teicher (1978), 
Shiryaev (1995). 

Now let us define the so-called net convergence (see Neveu 


1975). Let T be the set of all bounded stopping times with respect 
to the family (Fy, 1 € N). Here (JF, ) is a non-decreasing sequence 
of sub-o-fields of F and a stopping time T is a function with values 
in [0, co] such that [t =n ] © 3, for eachn € N. The family (a, t € 
T) of real numbers, called a net, is said to converge to the real 
number b provided for every ¢ > 0 there is Tg € T such that for all t 
€ T with t > Tq we have |a, — b| <e. 

Each of the examples given below contains appropriate 
references for further reading. 


18.1. On the conditions in the Kolmogorov three-series 
theorem 

(i) Let {X,, n = 1} be independent r.v.s with EX, = 0,1 = 1. Then 

the condition a n=1VX, < co implies that Dee \X, converges a.s. 


Note that this is one of the simplest versions of the Kolmogorov 
three-series theorem. 


‘> 
Let us show that the condition Don X,, < oo 1s not necessary for 
X 


n= 1} of independent r.v.s where 


Cx 
the convergence of don=1n. Indeed, consider the sequence {X,, 


BX, SS SP Xe Sar Sa, PSE Sa a a. 
Obviously a n=1VX; = co but nevertheless the series 22n—1 Xn is 
convergent a.s. according to the Borel-Cantelli lemma. 

(ii) The Kolmogorov three-series theorem yields the following 
result (see Chow and Teicher 1978): if {X,,, 1 = 1} are independent 
r.v.s with EX, = 0,1 = 1 and 


x 


(1) y, E[X2Iix,|<1 + |XnlZxn|>y)] < 


m=] 


: Oxo) 
then the series 2n= IX, CONVETZES a.S. 
Let us clarify the role of condition (1) in the convergence of 
Od 
> n=1 Xn, For this purpose consider the sequence {¢,, n > 1} of 


1 
iid. rv.s with P[¢) = 1] = P[é) = —-1] = 2 and define 


Xn = €n/V2, n > 1. It is easy to check that for any r > 2 the 
following condition holds: 


(2) > El|Xnl"] < 
nm—1 


Condition (2) can be considered in some sense similar to (1). 


However the series ener n diverges a.s. This shows that the 
power 2 1n the first term of the summands in (1) is essential. 
Finally let us note that if condition (2) is satisfied for some 0 <r 


< 2 then the series net does converge a.s. (see Loéve 1978). 


18.2. The independency condition ts essential in the 
Kolmogorov three-series theorem 


Let us start with a direct consequence of the Kolmogorov three- 

series theorem (sometimes called the “‘two-series’ theorem). If X,, 

CxO OX 0 

n = | are independent r.v.s and the series Donal, and 2un=1 Tn 
2 ; 

with a, = EX), Pn — VXn, are convergent, then the random series 


yen=1 Xn is convergent with probability 1. 
Our goal is to show that the independency property for X,, n = 


1, 1s essential for this and similar results to hold. 


(i) Let € be ar.v. with E¢é = 0 and 0 < Vé=b 2 < (i.e. ¢€ 18 non- 
degenerate). Define X, = ¢ /n, n = 1. Then a, = EX, = 0, 


t:3 View =f 2/72 implying that 


oe and 2 o? — 5° eu (1/n7) 


m—1 
Hence two of the conditions in the above result are satisfied and 
one condition, the independence of X,, n = 1, is not. Nevertheless 


the question about the convergence of the random series din=1 Xn 
is reasonable. Since 


Y Xn(w) = Ew) Do = 


the series pas LX, (@ ) 1s convergent on the set A = {@:¢(@)= 
0} and divergent on the set A° = {@ : €(@w ) # O}. If the non- 
degenerate r.v. ¢ 1s such that P(A) = p where p is any number in 
[0,1), we get a random series > n=1 *n which is convergent with 
probability p (strictly less than 1) and divergent with probability 1 
— p (strictly greater than 0). 

(ii) In case (1) the dependence among the variables X,, n = 1, is 


‘quite strong’—any two of them are functionally related. Let us see 
if the independence of X,, 1 => 1, can be weakened and replaced 


e.g. by the exchangeability property. We use the following 
modification of the Kolmogorov three-series theorem. If X,, n = 1, 


<s oe ar . 

are iid. rv.s with E|X{] < 0 and Cy, n= | are real numbers with 
= 1, e ox e 

din=1Cn < > then the random series vinwicX, is convergent 


with probability 1. 
Let us now consider the sequence of 1.1.d. r.v.s ¢,, n = 1 with 


E¢, = 0 and E(t] < © and let 7 be another r.v. with Ey = 0, 0 < 
E[y 21 < oo and independent of {¢,, n = 1}. Define the sequence 
X,,n = 1, by 


Xn = En oy Ss I. 


Thus X,, n = 1, 1s a sequence of identically distributed r.v.s with 
EX, = 0 and E[X{] < oo, Obviously the variables X,, n > 1, are 
not independent. However X,,, 1 = 1, is an exchangeable sequence. 
(See also Example 13.8.) Our goal is to study the convergence of 
the series aay a where c,, n = 1, satisfy the condition 
din=1 Cn < %©s Choose Cy, n = 1, such that c, > O for any n and 
Dn aley = © (an easy case 1S Cy = I/n ). Since cyXp = CyCy + On, 
we have 


x, i] 


_ 80 
> a S- Cnén +7 Ss” Bint 


n=] n=] n=] 
Cx 
The independence of Cy n= 1, implies that the series) n= CySy 1s 
xo CxO 
convergent a.s. Hence, in view of Dorie, = o0, the series Pons 
c,X, 18S convergent on the set A = {wm : 74 (@ ) =0} and divergent on 


A“ = {w:n(@ )# 0}. For preliminary given p, p € [0, l), take the 
r.v. 7 such that P(A) = p. Then the random series a Le77G, Ol 
exchangeable (but not independent) variables 1s convergent with 
probability p < 1 and divergent with probability 1 — p> 0. 

We have seen in both cases (1) and (11) the role of the 
independence property for random series to converge with 
probability 1. The same examples lead to one additional 
conclusion. According to the Kolmogorov 0-1 law, if X,, n= 1, 


are independent r.v.s, then the set {@ : DRE LX, (w_) converges} 
has probability 0 or 1. Hence, if X,, 1 = 1, are not independent we 


can obtain Pla : ye LX, (@ ) converges| = p for arbitrarily given 
p € [0,1). 


18.3. The interchange of expectations and infinite summation Is 
not always possible 


Let us start with the formulation of a result showing that in some 
cases the operations of expectations and summation can be 
interchanged (see Chow and Teicher 1978). If {X,, 1 = 1} are non- 


negative r.v.s then 


OO 


> Xn 


‘iw 


Cx 


~ S EXn. 


m=1 


(1) E 


Our aim now 1s to show that (1) 1s not true without the non- 
negativity of the variables X, even if the series ie \X,, 18 
convergent. 

Consider {cy, n = 1} to be 11.d. r.v.s with P[cy = +1] = 2 and 
define the stopping time 7 — inf {rn > 1: >), = 1}° where 
inf{O} = OO. Then it is easy to check that P[t < co] = 1. Setting X,, 
= Crl[cy), we get from me definition of t that 


= SE nl En > ard [ox 
n=l n=l n=l 


n—1 


i 
9 


However, the event [t>n | € o{&1,...,¢,-1}, the r.v.s ¢,, and J [77] 


are independent and from the properties of the expectation we 
obtain 


EX, — Ef, E/ [r>n] = 0), if} > l. 


Thus 2n— 1E.X,, = 0 and therefore (1) is not satisfied. 


18.4. A relationship between a convergence of random 
Sequences and convergence of conditional expectations 


On the probability space (Q, F, P) we have given r.v.s X and _X,, n 


> 1, all in the space L! (i.e. r -integrable) for some r > 1. Suppose 


Cx as n —» oo. Then for any sub-o-field A C F we have 


E[X,,|A] = E[X IA] as n > w (e.g. see Neveu 1975 or Shiryaev 
1995). This statement is a consequence of the Jensen inequality for 
conditional expectations. Obviously, we can ask the inverse 
question and the best way to answer it 1s to consider a specific 
example. 

Let X,,n> 1 be iid. r.v.s with PLY= 2c ] = P[X=0] =2, 
is a fixed real number. Take also (a trivial) r.v.. X =c: PLY = 
1. Then, if 4 = 010, °}. the trivial o-field, we obviously get 


E|X,|A] = EX, = 2c-5 + 0-5 =c =E[X|A] forany n> 1. 


Moreover because X, X,, are bounded, then for any r > 1, one has 


E[X,,|A] “> E[X|A] as n > ov. 


However E||X, — X|"] = _— =c) }= 3c + 43 ae =C" for all n 


eagle id rantiunes Kort aS Nn —> ©, 


18.5. The convergence of a sequence of random variables does 
not imply that the corresponding conditional medians 
converge 


Let (Q,F, P) be a probability space, Fo = (9,23 the trivial o- 
field and Y a sub-o-field of ¥. If X is a r.v., then the conditional 
median of X with respect to P is defined as a D-measurable r.v. M 
such that 


PIX >M 


D) > 3 <PLX < M|D] as 


Usually the conditional median is denoted by uw (X\ P) (see 
Example 6.10). 
If {X,, n = 1} 1s a sequence of r.v.s which is convergent in a 


definite sense, then it is logical to expect that the corresponding 


sequence of the conditional medians also will be convergent. In 
this connection let us formulate the following result (see Tomkins 
1975a). Let {X,,n => 1! and {M,, n= 1} be sequences of r.v.s such 


that for a given o-field D we have M, = yu (X; [D) a.s. and there 
a.s. P : 
exist r._v.s X and M such that Xn —> X and Mn — M as n — o, 


Then M = wu {X |P) a.s. We can now try to answer the question of 
whether the convergence of {X,, } always implies convergence of 


the conditional medians {/M, }. 


1 2 
Let é be ar.v. distributed uniformly on the interval (—2, 2), D = 
F ¢ (the trivial o-field) and define the sequence {X,, } by 


It is easy to see that Xn — +X asn > ow. Moreover, X,, has a 
unique median M, and M, = 0 or | accordingly as n is odd or 
even. But clearly the sequence {M,, } does not converge. (It would 


be useful for the reader to compare this example with the result 
cited above.) 


18.6. A sequence of conditional expectations can converge only 
on a Set of measure zero 


If (Q, F, P) is a complete probability space, (F,, n € N) an 
increasing family of sub-o-fields of F, F., = limy_, — Fy, and X a 
positive r.v., the following result holds (see Neveu 1975): 

(1) E[X |F,,] > E[X |F 0] outside the set {o: EX |F, | = for all 
n}., 

We shall show that this result cannot be improved. More precisely, 
we give an example of an a.s. finite ¥,, -measurable r.v. X such 


that ELY |F, | = © as. for all n € N. Clearly in such a case the 
convergence in (1) holds only on a set of measure zero. 


Let 2 = [0,1], ¥ = Bro, 1] and P be the Lebesgue measure. 
Consider the increasing sequence (J , ) of sub-o-fields of F where 
F , is generated by the dyadic partitions {[2 "kh, 2 "(k+1)),0<k 
< 2"”,n&N}. For each n € N choose a positive measurable function 
fn: [0,1) + RT of period 2 ” with 


l 
[tiyaont aad fiysaleyden 2” 
OQ) () 


Since the sum den=lle>o] is integrable, and hence a:.s. finite, then 
eae em 7 1S a positive r.v. which is finite a.s. Thus the series 
sae if, contains no more than a finite number of non-zero terms 
for almost all w. On the other hand, for all n € N and all k, O<k < 


2", we have 


NT gs "(k+1) 
/ xav> Of jes ea » | tue (Co 
Pla 2 


man mn 
by the periodicity of f,, . 
Therefore we have shown that ELX |F, ] = oo for all n € N 


meaning that the a.s. convergence in (1) holds only on a set of 
measure zero. 


18.7. When is a sequence of conditional expectations 
convergent almost surely? 

Let (Q, F, P) be a probability space and {3,, n = 1} an 

independent sequence of sub-o-fields of F, that is, for k = 1, 2, ... 


and A; pe 5,1 <j <k, we have 


P(A, Ao... Ax) = P(A1)P(A2)...P(Ax). 


Let X be an integrable r.v. with EX = a and let X, = E[X |F,, |. The 
following result is proved by Basterfield (1972): 
if E[|X|log™ |X|] < oo then PLX > aasn > co] = 1 
(log x is defined for x > 0 and log* x = log x ifx > 1, and 0 if0 <x 
<1). 
We aim to show that the assumption in this result cannot be 


weakened, e.g. it cannot be replaced by E[|X|] < co. To see this, 
consider a sequence {A,, 1 > 1} of independent events with P(A,, ) 


= |/n. Define the r.v. X by 


Cx 


m1 


where €,,, 1s the indicator function of the event A1A9 ... Ay. Since 


i 
Eé,, = P(A... Am) = P(A1)...P(Am) = — 


m! 


we obtain 


~ m! ae 1 | 
Te oe - a 


m— 1 


Moreover, it is not difficult to verify that 


E[X logT X] = oo. 
Consider now Fn = 10, An, Ah. ©} and Xp = ELX |Fn |. We need 
to check if <n —s 2 as on — 00, Since 
E[X|A,] = (1/P(An)) [4 XdP = nf, X dP, replacing XY by 
>» m=1 $m+2 we arrive at the equality 


E[X|A,] = 3. 


However, n=1P(A, ) = o and by the Borel-Cantelli lemma, 
almost all w belong to infinitely many A,. Therefore 


limsup X, =244=a=EX. 


TL? OO 


Thus the condition E[|X| log™ |X |] < o cannot be replaced by 


E[|X]] << co so as to. preserve the convergence 
A SR S| eB, 469-5 65. 


18.8. The Weierstrass theorem for the unconditional 
convergence of a numerical series does not hold for a 
series of random variables 


Let Le n=la, be an infinite series of real numbers. This series 1s 
said to converge unconditionally if k=1 In, << © for every 
rearrangement {n], 19,...$ of {1,2,...$. (By rearrangement we 
understand a one-one map of N onto N.) We say that the series 
paaye converges absolutely if es n=1\a, | < «©. According to the 


classical Weierstrass theorem these two concepts, unconditional 
convergence and absolute convergence, are equivalent. 
Thus we arrive at the question: what happens when considering 


random series 2on- =X, (w ), that is series of r.v.s? 
Let {X,, n = 1} be a sequence of r.v.s defined on some 


probability space (Q, F, P ). The series ee =LX,, 1s said to be a.s. 
unconditionally convergent if for every rearrangement {nz } of N 


we have 2k—LY, LX), < © as. if Yon: n=1|X, | < co a.s., the given series 


is a.s. absolutely convergent. 

So, bearing in mind the Weierstrass theorem, we could suppose 
that the concepts a.s. unconditional and a.s. absolute convergence 
are equivalent. However, as will be seen later, such a conjecture 1s 
not generally true. 


Consider the sequence {r,, n = 0,1, 2,...} of the so-called 
Rademacher functions, that is 7, (@ ) = sign sin(2"7@ ), 0<@ <1, 
n=0,1,... (see Lukacs 1975). Actually 7, can also be written in the 
form 
die i 2h” 2 et ok iy" 
rw) = —-1, if (264+1)/2" <w < (2k+42)/2” 
(), i w= fe oe Ue laage 
Then {7, } 1s a sequence of independent r.v.s on the probability 
space (Q, F, P) with © = [0,1], # = Bro, 1] and P the Lebesgue 
1 
measure. Moreover, 7, takes the values 1 and —1 with probability 2 
each, Er, = 0, Vr, = 1. 
Now take any numerical sequence {a, } such that 


00 00 
a 2 a 
S G00 but ) ig| =O. 


n=] 


For example, a, = (—l)” /(n + 1). Using the sequence {r7, } of the 
Rademacher functions and the numerical sequence {a, } we 
construct the series 


(1) Farell 


n=1 

Applying the Kolmogorov three-series theorem we easily conclude 

that this series is a.s. convergent. If {nz } 1s any rearrangement of 

N then the series 22¢=1%nx7n.() jg also ass. convergent. 

However, onala AyTy (@ ) 1s not absolutely convergent since |t, (@ 
)| = Land Xn=tlay | = 20 

Therefore the series (1) 1s a.s. unconditionally convergent but 


not a.s. absolutely convergent, and so these two concepts of 
convergence of random series are not equivalent. 


18.9. A condition which is sufficient but not necessary for the 
convergence of a random power series 


Let a, = Oy (wm ),n = 0,1, 2,..., be a sequence of 1.1.d. r.v.s. The 


ea) ee, a 
random power Series, that 1s a series of the type Deri An (w)z 1S 
defined in the standard manner (see Lukacs 1975). As in the 
deterministic case (when a, are numbers), one of the basic 


problems is to find the so-called radius of convergence r=r (@ ). 
This 7 is ar.v. such that for all |z| < r the series in=0 An (Ww 2 "is 


a.s. convergent. Moreover, ! (w) = (limsup,_,.. Vl@n(w)|)~’ 
Among the variety of results concerning random power series, 


we formulate here the following (see Lukacs 1975). If {a,, n = 0} 
are iid. r.v.s and the df. F (x ), x € R® of |aj| satisfies the 
condition 


eOXD 
(1) | logrdF (x) < co 
J] 


then the random power series yn—0 Un(~)2" has a radius of 
convergence r (q@ ) such that P[r (w ) => 1] = 1. 

Let us show by a concrete example that condition (1) is not 
necessary for the existence of r with P[r > 1] = 1. Take € as ar.v. 
distributed uniformly on the interval [0,1]. Define ay, by ay (@ ) = 


exp(I/G(w )). Then the common d.f. of a, 1s 


a 0, if z<e 
a 1—(logr)', if r>e. 


Clearly 


| | log x dF (x) = 
a 


and condition (1) 1s not satisfied. However, for any ¢ > 0 we have 
Pllim sup [la,| > (1 + ¢)”]] = Pllim sup [0, 1/(n log(1 + ¢))]] = 0. 
nh Oo Tit OO 


This relation, the definition of a radius of convergence and a result 
of Lukacs (1975) allow us to conclude that P[7r (w ) <x |] =0 for all 
x € (0,1]. For x = 1 we get P[r (w ) = 1] = 1. Thus condition (1) 1s 
not necessary for the random power series > n—0 Un()2” to have 
a radius of convergence r> I. 


18.10. A random power series without a radius of convergence 
in probability 


As before consider a random power series and its partial sums 


Cx 
) An(w)z” and Un(z,w) *5 An, (w) 


n=0 n=O 
where the coefficients Oy (w ),n=0,1,..., are given r.v.s. If U, (z ) 


are convergent in some definite sense as N — oo then the random 
power series is said to converge in the same sense. There are 
several interesting results about the existence of the radi of 
convergence if we consider a.s. convergence, convergence in 


probability and L’ -convergence. Note that a.s. convergence was 
treated in Example 18.9. 

Now we aim to show that no circle of convergence exists for 
convergence in probability of a random power series. For this let 
{ay (@ ), n= 0} be independent r.v.s with 


Plag=O/—1, Pag=n")—Ln, Plag=0)=—1—-1/2n, 221. 


: : ox a om ae 
It is easy to check that the power series > n=0 In(w)2” is ass. 


divergent, that is its radius of convergence 1s rg = 0. Clearly, this 


series cannot converge in probability or in L’ -sense. 
From the definition of a, we find that 


Piiaalin)e"| > &| = Plag(e) elz)" "|= Ln = a@al ie” —+0asn + 00. 


Define another power series, say n=0 On(w)2” , whose 
coefficients b, are given by bg = ag and b, = a, — ap—} forn = 1. 
Obviously, 


lim : ae be Jim tre =A 
Noo N00 


n=0 
Furthermore, we have 
Wed 
(1) \ b,(w)z” = an(w)z® + (1 —2z) >. Gli". 
n=0 n=) 


xo Fe E: ote . ey 
It is clear that the series 2/n=o Pn(w)2" converges in probability at 
least at two points, namely z = 0 and z = 1. If we suppose that it 1s 
convergent for a point z such that z # 0 and z # 1, we derive from 
(1) that 


N-1 N 


| - 1 : _ 
>, an(w)z" = T= > bn(w)2” — an(w)z* 


n—O ~ Ln-d 


which must also converge in probability as N — oo. However, this 
contradicts the fact that rg = 0. 


Therefore in general the random power series has no circle of 
convergence in probability. 

Finally, note that Arnold (1967) characterized probability spaces 
in which every random power series has a circle of convergence in 
probability. Such a property holds iff the probability space 1s 
atomic. 


18.11. Two sequences of random variables can obey the same 
strong law of large numbers but one of them may not be 
in the domain of attraction of the other 


Let {X%, k= 1} and {¥;, k = 1} each be a sequence of 1.1.d. r.v.s. 


Omitting subscripts, we say that XY and Y obey the same SLLN if 
for each number sequence {apy, 7 = 1} with 0 <a, 7 © either 


ae 3 Xe= O(d,,) and Sie. la} ais: 
k=1 
or 
(2) lim sup — 22 X, = oo and lim sup — as ¥;, == 66 “As: 
Theo Os (Ly, = Th 30 ln — i| 


We also need the following result of Stout (1979): _X and Y obey 
the same SLLN iff 


(3) P|Y| > 2]/P||X| > x] = O(@) asx > ow. 


Note that the statement that two r.v.s X and Y, or more exactly 
two sequences {X;, 4 > 1} and {¥;, k= 1}, obey the same SLLN is 


closely related to another statement involving the so-called domain 
of attraction. Let U and V be r.v.s with d.f.s G and H and ch.f-s @ 
and yw respectively. Suppose #/ 1s a stable distribution with index y 
(see Feller 1971 ; Zolotarev 1986). We say that U belongs to the 
domain of normal attraction of V if for suitable constants 5, and 
Cy =CN IY the distribution of (1/¢n) i=1 Uk — bn tends to Has n 
— oo, or in terms of the corresponding ch.f.s: 


lim exp(itb,,)d"(t/c,) = w(t), t ER’. 
nooo 


We write U € N (y ) to denote that U is in the domain of normal 


attraction of a stable law with index y . Now let X € N (yx ) and Y € 
N (yy ) where y, <2, yy <2. Then, as a consequence of a result by 


Gnedenko and Kolmogorov (1954), we obtain that XY and Y obey 
the same SLLN iff yy = y,. 


Thus we come to the question: Can a r.v. Y fail to be in the 
domain of normal attraction of a stable law X and yet obey the 
same SLLN as X? By an example we show that the answer is 
positive. Consider a r.v._X with a Cauchy distribution and a r.v. Y 
whose d.f. Fis given by 
(4) F(z) = i — (x+3)—1![2 + sin(log x)], 7 : 7 0 

ifr < 0. 

It is easy to check that_X and Y satisfy condition (3) and hence X 
and Y obey the same SLLN in the sense of (1) and (2). According 
to a result by Gnedenko and Kolmogorov (1954) the r.v. Y is in the 
domain of attraction of a Cauchy distribution only if 


(5) PIlY 


> zs] = (a+ B(2))/z 


for some a = constant > 0 and f (x ) —> 0 as x —> o. However, 
the d.f. F given by (4) does not satisfy (5). 

Therefore Y is not in the domain of normal attraction of the 
Cauchy-distributed r.v. X despite the fact that X and Y obey the 
same SLLN. 


18.12. Does a sequence of random variables always imitate 
normal behaviour? 


Let F (x ),x€ R! be a df. with zero mean and variance 1. Consider 
the sequence {X,, 1 = 1} of 1.1.d. r.v.s whose d.f. is #, and another 


sequence of independent r.v.s {Y,, n => 1! each distributed N(0,1). 
We also have a non-decreasing sequence {a,, n = |} of real 


numbers. As usual, let S, = X] + -: + X, . We say that the 
sequence {S,, n => 1} (generated by {X, }) imitates normal 
behaviour to within {a, } 1f there is a probability space with r.v.s 
{X, } and {Y, } defined on it such that {X, } are 1.1.d. with a 


common d.f. F, {Y, } are independent N(0,1) and 


| ALS, 
(1) —|[(X, +---+X,)-—(¥. +---+¥n)] 30 as n> ow. 


a Th 


Note that the first result of this type was obtained by Strassen 
(1964), who showed that every sequence {S, } with E X; = 0 and 
E(X{] = 1 imitates normal behaviour to within {ay } with a, = (n 
log log n)!/ 2 He used this result to prove the law of iterated 


logarithm (LIL) for all such sequences. The question now 1s 
whether it is possible to choose a sequence {a, } ‘smaller’ than 


{(nloglogn)!/ 21 and preserve the property described by (1). Some 
results in this direction can be found in Breiman (1967). Our aim 
now is to show that the condition on {ay}, that is a, = (n log log 


n)!/ 2 cannot be weakened too much. More precisely, let us show 
that the sequence {S,, } defined above does not imitate normal 
behaviour to within {b, } where b, = nl/2 

Firstly, define the sequence {n; ,k > 2} by 


Neti — Mk = Mr1i/g(Me11) where g(nz) = Blogk, 8 > 2. 
Thus the differences nj+1 — nj, k = 2, are increasing. 
Suppose now that {S,,} imitates normal behaviour to within {b, 
\. Then for Z, = ¢] + ++ + ¢, sums of independent N(0,1) r.v.s, the 
series 


YP lems Ces], Gti SP Zien 95, Creel 
ay =) 


must converge or diverge simultaneously as a consequence of (1). 
Take X; = 7,0; where 1, 01, 42, 99, ... are all mutually 


independent, 0; ~ N(O,1), Ey, = 0, E[7{] = 1 and the distribution of 
71 will be specified later. We have 


(2) PIZ nx pi—ne > V/Me+1]) © (9(Me+41))- ca exp|— 59(M-41)]- 


For the sequence {S, } We find 


(3) PIS n, aon, > VRk+I | My M2) +> | aoa (Ux /Me+ | le exp[—Nk+1 /(2ux)| 
where “ = t+ May pinn We can take 7] distributed such 
that 

‘ C 
(4) 7 ne > ng(n) >Pp U( ni. > ng( n)) > aun) 


where h (n ) = (log n_) (log log 7 )!* and 6 > 0. From (3) and (4) 
it follows that 


(5) PUSniyi—ng > VMe+1] 2 €/[V 9(re41) h(me41)]- 


Taking (2) into account we find that 


SPlZneyi-ne > Vapi] < 00 


k=2 
On the other hand, since logn;4 1 ~ k /(6 logk ), for any fixed 0, 0 < 
1 
0 < 2, we obtain from (5) that 


ee) 
a nee _ Nk+1| = ©. 
f=—2 


However, these two series must converge or diverge together. This 
contradiction shows that the sequence {S, } does not imitate 
normal behaviour to within {5, } where bn = VN. Therefore in 


the result of Strassen (1964) the sequence a, = (n log log n)!/ - 


cannot be replaced by b, =n V2 


18.13. On the Chover law of iterated logarithm 
Let {X,, n= 1} be a sequence of 1.1.d. r.v.s with a d.f. /. Denote 


‘| | uv Th é 
Sr = X\ gas Aes En aa Sf Ons In = En | oe ne ee = 


where {b,, n => 1} are norming constants, b, > 0, n = 1. It 1s 
interesting to study the asymptotic behaviour of 7, as n — o. For 
example, Vasudeva (1984) has proved the following result. 
Suppose there exists a sequence {b, } such that &n “3 fas n — 00 
where ¢ is a Stable r.v. with index y, 0 < y < 2. Then 7/n —p 


where p is a definite number in the interval [0, 0). 
Let us note that the a.s. convergence of {7 + 1s known as the 


Chover law of iterated logarithm (for references and details see 
Vasudeva 1984). 

One can ask whether it is necessary to assume the weak 
convergence of {c, } in order to get the a.s. convergence of {7p 5. 
Our aim now is to describe a sequence of r.v.s {X,, n = 1} such 
that ¢, = (X7 + ++ + Xp )/by, for given {by}s, fails to converge 
weakly, but nevertheless 


A.S. iy /2 


Mn = |€,|'/ 088" =*s constant =e asm. —> OOo. 


For this take the function F (x ), x € IR! where 
eee, | 
F(—x) =1- F(x) = ‘7 = ate 7 
x ¥*(1+ 4sin(logz)), ifa>1. 
It is easy to see that F is a d.f. Let ~ n= 1! be1.1.d. r.v.s whose 


common d.f. is F. Choose bn = n"/ Me n > 1, aS a sequence of 
norming constants. Note that for all x > 0 we have 


bol i es 
~ 


Then for my = |Sp /by Moglog ny > 3, (with bn = “ we find 
the following two relations: 

P[|Sn| > bn (logn)-2)/¥2] > ex /(n(logn)!-*), 

P[|S,,| > b, (log n)Ct+#)/ v2) < cy/(n(log n)*T*) 


valid for any ¢ € (0,1) and all n => no, where no is a fixed natural 


number. By the Borel-Cantelli lemma and a result by Feller (1946) 
we find that 


P(|Sn| > bn (logn)-©/¥? i.0.] = 1, PI 


Therefore 


Sn| > ba(logn)“te)/¥ ey o.)=0. 


P| lim 7, = el/ v2) = 1. 


nh 
Applying a result of Zolotarev and Korolyuk (1961) we see that 
the sequence {¢, + cannot converge weakly (to a non-degenerate 
r.v.) for any choice of the norming constants {b, }. 


18.14. On record values and maxima of a sequence of random 
variables 


Let {X,, n = 1} be a sequence of 1.1.d. r.v.s with a common d.f. F. 
Recall that X; is said to be a record value of {X, } ut X; > 
max {X],..., X;-1}. By convention X] is a record. Define the r.v.s 
(Ty, n = OF by 


t= 0. Tem oe ere Xe 


Obviously the variables t,, are the indices at which record values 
occur. Further, we shall analyse some properties of two sequences, 
{X T,,n = 1} and the sequence of maxima {M,, n= 1} where M,, = 
max {X],..., Xy }. 

The sequence of r.v.s {¢y, n = 1} 1s called stable if there exist 
norming constants {b,, n = 1} such that ¢, /by, > 1 asn— o. If 
the convergence is with probability 1, then {c, + 1s a.s. stable, 
while if the convergence is in probability, we say that {c) } 1s 


stable in probability. 

Let us formulate a result connecting the sequences {X tT, } and 
{M,, } (see Resnik 1973): if {X tT, } 1s stable in probability, then 
the same holds for {M,, }. 

Note firstly that the function h (x ) = — log(| — F (x )) and its 
inverse function A! {x ) =inf{y :h(y)>-x } play an important 
role in studying records and maxima of random sequences. In 


particular the above result of Resnik has the following precise 
formulation: as n — 00 


Xk ny) “y1> M n/h—* (log n) Zt. 

This naturally raises the question of whether the converse 1s 
true. By an example we show that the answer is negative. Take the 
function h (x ) = (logx , x > 1, and let LX, } be 1.1.d. r.v.s with df. 
F corresponding to this 4. As above, {M, } and {X Tt, } are the 
sequences of the maxima and the records respectively. Since the 


function h ~ logy ) = exp[(logy yl 21 is Slowly varying, according 
to Gnedenko (1943), {M,, } 1s stable in probability. Moreover, 
from a result by Resnik (1973), {M,, } is a.s. stable. Nevertheless, 
the sequence of records {X tT, } is not stable in probability since 
the function h logy )7) = y (compare with the result cited 


above) is not slowly varying and this condition (see again Resnik 
1973) is necessary for {X tT, } to be stable. 


Chapter 4 


Stochastic Processes 


Courtes of Professor A. T. Fomenko of Moscow University. 


SECTION 19. BASIC NOTIONS ON STOCHASTIC 
PROCESSES 


Let (Q, ¥, P) be a probability space, 7 a subset of R* and (E, &) a 


measurable space. The family X = (X;, t € T) of random variables 


on (Q, F, P) with values in (£, €) is called a stochastic process (or 
a random process, or simply a process). We call 7 the parameter 
set (or time set, or index set) and (£, &) the state space of the 
process_X. For every @ € Q, the mapping from 7 into E defined by 
{++ X,(w) 1s called a trajectory (path, realization) of the process 
X. We shall restrict ourselves to the case of real-valued processes, 
that is processes whose state space is E = r! and ¢ =8!. The index 


set T will be either discrete (a subset of N) or some interval in R™. 
Some classes of stochastic processes can be characterized by the 
family ‘P of the finite-dimensional distributions. Recall that 


The following fundamental result (Kolmogorov theorem) is 
important in this analysis. Let ‘P be a family of finite-dimensional 
distributions, 


Pi Bic AO adit Bin). Ly iene din CL; Ripe tty eR 


and satisfies the compatibility (consistency) condition (see 
Example 2.3). Then there exists a probability space (Q, 4, P) with 
a stochastic process X = (x;, t € T) defined on it such that X has ‘Pp 
as its family of the finite-dimensional distributions. 

Let T be a finite or infinite interval of R*. We say that the 
process X = (X;, t € 7) 1s continuous (more exactly, almost surely 
continuous) if almost all of its trajectories X(@), t € T are 
continuous functions. In this case we say that_X is a process 1n the 


space C(7) (of the continuous functions from 7’ to Ri), Further, X 1s 


said to be right-continuous if almost all of its trajectories are right- 
continuous functions: that is, for each t, X; = X;+ where 


X++:= lim, |; Xs. In this case, if the left-hand limits exist for each 
time ¢, the process X is without discontinuity of second kind and 
we say that X is a process in the space p(T) (the space of all 


functions from T to R! which are right-continuous and have left- 
hand limits). We can define the left-continuity of X analogously. 


Let (F, t€ R*) be an increasing family of sub-o-fields of the 
basic o-field F, that is F, © ¥ for eacht € R’ and Foo Fit s <t. 


The family (F;, t € R*) is called filtration of Q. For each t € R* 
define 


ene ( ) Pex the VV Fi 


S>t sat 


(Recall that aor J; denotes the minimal o-field including all 
F,,8 < t.) For ¢ = 0 we set Fp. = Fo and F— = Vie epe Ft. 
The filtration (F;, t € R”) is continuous if it is both left-continuous 
and right-continuous. We say that the process X = (X;, t € R’) is 
adapted with the filtration (¥', t € R*) if for each ¢ € R™ the r.v. X; 
is $,-measurable. In this case we write simply that X is (F;) 
adapted. 

The quadruple (Q, ¥, (F,, t € R‘),P) is called a probability 
basis. The phrase ‘X = (X;, t € R*) is a process on (Q, F, (F;, €R 
T) P)’ means that (Q, ¥, P) is a probability space, (F,t€ R’) isa 
filtration, XY is a process on (Q, , P) and_X is (¥,;)-adapted. If the 
filtration (F;, t € R") is right-continuous and is completed by all 


subsets in Q of P-measure zero, we say that this filtration satisfies 
the usual conditions. 


Other important notions and properties concerning the stochastic 
processes will be introduced in the examples themselves. 

The reader will find systematic presentations of the basic theory 
of stochastic processes in many books (see Doob 1953, 1984; 
Blumenthal and Getoor 1968; Gihman and Skorohod 1974/1979; 
Ash and Gardner 1975; Prohorov and Rozanov 1969; Dellacherie 
1972; Dellacherie and Meyer 1978, 1982; Loéve 1978; Wentzell 
1981; Metivier 1982; Jacod and Shiryaev 1987; Revuz and Yor 
1991; Rao 1995). 


19.1. Is it possible to find a probability space on which 
any stochastic process can be defined? 


Let (Q, ¥, P) be a fixed probability space and X¥ = (X;, te T R*) be 
a stochastic process. It is quite natural to ask whether X can be 
defined on this space. Our motive for asking such a question will 
be clear if we recall the following result (see Ash 1972): there 
exists a universal probability space on which all possible random 
variables can be defined. 

By the two examples considered below we show that some 
difficulties can arise when trying to extend this result to stochastic 
processes. 


(i) Suppose the answer to the above question 1s positive and (Q, F, 
P) is such a space. Note that Q 1s fixed, is fixed and clearly the 


cardinality of is less than or equal to 22. Choose the index set 7 
such that its cardinality is greater than that of F and consider the 
process X = (X; t € T) where X; are independent r.v.s with 


P[X, = 0] = P[X, = 1] = 5 for each t € T. However, if ty # to, ty, 
ty € T, the events [A; = 1] and [X;, = 1] cannot be equivalent, since 


this would give 


f= PIX, = = PiXy = 1X = 1A PIX, = PIX = = fb 


This contradiction shows that + must contain events whose 
number is greater than or at least equal to the cardinality of 7. But 
this contradicts the choice of 7. 


(ii) Let Q = [0, 1], F = Bro, 1] and P be the Lebesgue measure. We 
shall show that on this space (Q, F, P) there does not exist a 
process X = (X;, t € [0, 1]) such that the variables X; are 
independent and X; takes the values 0 and 1 with probability + 
each. (Compare this with case (1) above.) ; 

Suppose X does exist on (Q, F, P). Then ELX;| < co for every ¢ € 
[O, 1]. Let J be the countable set of simple r.v.s of the type 
>. Cela, Where cz are rational numbers and {A;} are finite 
partitions of the interval [0, 1] into subintervals with rational 
endpoints. Since ELX;] < o then according to Billingsley (1995) 
there is ar.v. Y; € J with E[|X; — Y;|] < 4. (Instead of + we could 
take any € > 0.) However, for arbitrary s, t we have E[|X, — X;|] = 
implying that E[|Y¥, — Y,] > 0 for all s # ¢. But there are only 
countably many variables Y;. 


19.2. What is the role of the family of finite-dimensional 
distributions in constructing a stochastic process 
with specific properties? 


Let PSPS? kt Gide ee GR) be a 
compatible family of finite-dimensional distributions. Then we can 
always find a probability space (Q, +, P) and a process X = (X;, t € 
T) defined on it with just P as a family of its finite-dimensional 
distributions. Note, however, that this result says nothing about any 
other properties of X. Now we aim to clarify whether the 
compatibility conditions are sufficient to define a _ process 
satisfying some preliminary prescribed properties. 

We are given the compatible family ‘P and the following family 


of functions {X;(w),w <€Q,t eT}. Denote by A and g 
respectively the smallest Borel field and the smallest o-field with 
respect to which every X; 1s measurable. 


Since every set in 4 has the form 4 = {@: (X%} (@), ..., Xw 
(w)) € Bt where B € 8", by the relation 


(1) P(A) = P{w: (Xz, (w),..., Xz, (w)) € B} = / dP. (eecesea) 
# GF 


we define a probability measure P on (Q, .4) and this measure 1s 
additive. However, it can happen that P is not o-additive and in this 
case we cannot extend P from (Q, 4) to (Q, 4) (recall that g 1s the 
o-field generated by A) to get a process (X; ¢ € 7) with the 
prescribed finite-dimensional distributions. 

Let us illustrate this by an example. Take 7 = [0, 1] and Q=c 
[0, 1] the space of all continuous functions on [0, 1]. Let X;(@) be 
the coordinate function: X,((@) = w(t) where m = {a(t), t € [0, 1]} € 
c [0, 1]. Suppose the family Pp = {P,,} 1s defined as follows: 


rt ae 
(2) i Sere ae (Py. ene oe ) = I] | gu) du 
kao 


where g(u), u € r! is any probability density function which is 
symmetric with respect to zero. It is easy to see that this family “P 
is compatible. Further, the measure P which we want to find must 
satisfy the relation 


5 2 
PX; > ¢,X, < —é] = (| g(u) tu) for any ¢ > 0. 


Since Q) = c[0, 1], the sets 
An = {w: Xi(w) > €, Xt41/n(w) < —e} must tend to the empty 
set () as n — o for every ¢ > 0. However, 


lim P(A,) = ( / | glu) au) = '(); 


Hence the measure P defined by (1) and (2) is not continuous at (J 
(or, equivalently, P is not o-additive) and thus P cannot be 
extended to the o-field 4. Moreover, for any probability measure P 
on (Q, 4) we have P(Q) = 1 which means that with probability 1 
every trajectory would be continuous. However, as we have 
shown, the family ‘P is not consistent with this fact despite its 
compatibility. 

In particular, note that (2) implies that (X;, t € [0, 1]) 1s a set of 
r.v.s which are independent and each is distributed symmetrically 
with density g. The independence between _X; and_X,, even for very 


close s and ¢, is inconsistent with the desired continuity of the 
process. 

This example and others show that the family, even though 
compatible, must satisfy additional conditions in order to obtain a 
stochastic process whose trajectories possess specific properties 
(see Prohorov 1956; Billingsley 1968). 


19.3. Stochastic processes whose modifications possess 
quite different properties 


Let X = (X%}, t € 7) and Y = (%, t € 7) be two stochastic processes 
defined on the same probability space (Q, ¥, P) and taking values 
in the same state space (£, €). We say that Y 1s a modification of X 
(and conversely) if for each t € T, Plw: X;(w) # ¥;(w)| = 0. If we 
have 


Plurer : Xi(w) F ¥i(w)}] = 0 


then the processes_X and Y are called indistinguishable. 
The following examples illustrate the relationship between these 
two notions and show that two processes can have very different 


properties even if one of the processes is a modification of the 
other. 


(i) Note firstly that if the parameter set 71s countable, then _Y and Y 
are indistinguishable iff Y is a modification of X. Thus some 
differences can arise only if 7 is not countable. 

So, define the probability space (Q, F, P) as follows: Q = R*, F 
= —° and P is any absolutely continuous probability distribution. 
Take T= R" and consider the processes X = (X;, t € R‘) and Y= 
(Y;, t€ R") where X(w) = 0 and Y(@) = 1 4,(@). Obviously, Y is a 
modification of X and this fact is a consequence of the absolute 
continuity of P. Nevertheless the processes X and Y are not 
indistinguishable, as is easily seen. 


(ii) Let Q = [0,1], F = Bio 4), P be the Lebesgue measure and T = 


Rr’. As usual, denote by [f] the integer part of ¢. Consider two 
processes X= ;,t € R') Y=(Y. pte R*) where 


| inf t—ltAw 
X,(w) = 0 forallwandallt, Y:(w) = i: : fs. é 7 ‘ 


It is obvious that Y is a modification of X. Moreover, all 
trajectories of X are continuous while all trajectories of Y are 
discontinuous. (A similar fact holds for the processes X and Y in 
case (1).) 

(iii) Let + be a non-negative r.v. with an absolutely continuous 
distribution. Define the processes X = (X;, t € R*) and Y=(Y ~pteER 


T) where 


Xt = Xi(w) = Upwycyw), Ye = Yew) = Ippwyeyw), Xo =0, Yo =O. 


It is easy to see that for each tf € R™ we have 
Pho ¢ Xj iGo) = Yolo) = Plas re) St) = 0, 


Hence each of the processes X and Y 1s a modification of the other. 
But let us look at their trajectories. Clearly X is right-continuous 
with left-hand limits, while Y is left-continuous with right-hand 
limits. 


19.4. On the separability property of stochastic processes 


Let ¥=(X% te TC rt) be a stochastic process defined on the 
probability space (Q, F, P) and taking values in the measurable 
space (ri, pl), The process_X is said to be separable if there exists 
a countable dense subset Sg © T such that for every closed set B € 


p! and every open set J € Ri, 


tw: X;(w) € Bforallt ¢ 77} = {w: X,(w) € B forall s € Sol}. 


Clearly, if the process X is separable, then any event associated 
with_X can be represented by countably many operations like union 
and intersection. The last situation, as we know, is typical in 
probability theory. However, not every stochastic process 1s 
separable. 


(i) Let + be a r.v. distributed uniformly on the interval [0, 1]. 
Consider the process _X = (X;, t € [0, 1]) where 


Wee | ae. BE eae 
X= Xu) ={ § if r(w) At. 


If Sis any countable subset of [0, 1] we have 


PX, =Oforte€ S}=1, P[X, =O fort € (0, 1]] = 0. 


Therefore the process_X is not separable. 


(ii) Consider the probability space (Q, F, P) where Q = [0, 1], is 
the o-field of Lebesgue-measurable sets of [0, 1] and P is the 
Lebesgue measure. Let 7 = [0, 1] and a be a non-Lebesgue- 
measurable set contained in [0, 1] (the construction of such sets 1s 
described by Halmos 1974). Define the function _X = (X;, t € T) by 


1, if t€ Aandw=t 
Xp le) = F otherwise. 


Then for each t € T, t € a, we have X(@) = 0 for all w € Q. 
Further, for each ¢t € T, t € a, we have X(@) = 0 for all ow € QO 
except for w = t when X(q@) = 1. Thus for every t € 7, X,((@) 1s ¥- 
measurable and hence_X is a stochastic process. 

Let us note that for each w € Q, w E a, we have X,(@) = 0 for all 
t < T, and for each w € Q, w € A, we have X(@w) = 0 except for ¢ = 
wm when X(q@) = |. Therefore every sample function of X is a 
Lebesgue-measurable function. Suppose now that the process_X 1s 
separable. Then there would be a countable dense subset Sg © 7 
such that for every closed B and open J, 


{w: X,(w) € B forallt € TI} = {w: X,(w) € B forall s € Sol} 
and both events belong to the o-field ¥. Take in particular B ={0, 4] 
and = R!. Then the event 


{w: X:(w) € [0,5] forall t¢ 7} = [0,1] \A 


does not belong to ‘F. Hence the process_X is not separable. 

The processes considered in cases (1) and (11) have modifications 
which are separable. For very general results concerning the 
existence of separable modifications of stochastic processes we 


refer the reader to the books by Doob (1953, 1984), Gihman and 
Skorohod (1974/1979), Yeh (1973), Ash and Gardner (1975) and 
Rao (1979, 1995). 


19.5. Measurable and progressively measurable 
stochastic processes 


Consider the process _X = (X;, t = 0) defined on the probability basis 
(QQ, F, (F4), P) and taking values in some measurable space (EF, &). 
Here (F) 1s a filtration satisfying the usual conditions (see the 
introductory notes). Recall that if for each t, X; 1s 44-measurable, 
we say that the process_X 1s (¥t)-adapted. The process_X 1s said to 
be measurable if the mapping (t,w) + X,(w) of R* x Qto FE is 


measurable with respect to the product o-field BY x F. Finally, the 
process X is called progressively measurable (or simply, a 
progressive process) it for each ¢, the map 
(s,w) > X,(w) of [0,¢] x O to F is By 4 x Fy-measurable. 

Now let us consider examples to answer the following questions, 
(a) Does every process have a measurable modification? (b) What 
is the relationship between measurability and _ progressive 
measurability? 


(i) Let X = (X%; t [0, 1]) be a stochastic process consisting of 
mutually independent r.v.s such that EX; = 0 and ELX7]= 1, te [0, 
1]. We want to know if this process is measurable. Suppose the 
answer 1S positive: that 1s there exists a (t, w)-measurable family 
(X,(@) with these properties: EX; = 0 for ¢ € [0, 1], ELX, X;] = 0 if s 
#tand ELX, X;] = 1 if s = ¢. It follows that for every subinterval / 
of [0, 1] we should have 


JO SP sl 


Hence using the Fubini theorem we obtain 


€{ (dear) } yf [ Xe X;(w) sai} - [ few. X,]dsdt=0. 


Thus for a set Ny with P(N; = O we have 
[, Xt(w) dt = 0 if w ¢ Ny. Consider now all subintervals J = [7’, r' 
‘| with er endpoints 7’, 7’ and let N = U; Nj. Then PCV) and 
for all @ in the complement N° of N we have i X,(w) dt = 0 for 


any subinterval [a, b] of [0, 1]. This means that for w € N°, X; (@) 


= 0 for all t except possibly for a set of Lebesgue measure zero. 
Applying the Fubini theorem again we find that 


l 
/ X?(w)P(dw) dt = 0. 
2/0 


However, this is not possible, since 


: 1 1 
/ X?(w)P(dw) dé = / Exe | dea. 
(2 J 0 rQ 


This contradiction shows that the process X is not measurable. 
Moreover, the same reasoning shows that X does not have a 
measurable modification. 


(ii) Consider now a situation which could be compared with case 
(i). Let ¥ = (X% tec R}) be a second-order stochastic process 
(E[|X?] < co forallt € 7) and let C(s, t) = ELX, S;] be its 
covariance function. If XY is a measurable process, then it follows 
from the Fubini theorem that C(s, t) is a Br <x Bz-measurable 


function. This fact leads naturally to the question: does the 
measurability of the covariance function C (s, ft) imply that the 
process X is measurable? The example below shows that the 


answer 1s negative. 

Let 7 = [0, 1] and X = (X% t € [0, 1]) be a family of zero-mean 
r.v.s of unit variance and such that X, and xX; for s # ¢ are 
uncorrelated: that is, EX, = 0 for t € [0, 1], and C(s, t) = ELX° X;] = 
Oifs#t, C(s, t)=1ifs=t,s,te€[0, 1]. 

Since C is symmetric and non-negative definite, there exists a 
probability space (Q, 4, P) and on it a real-valued process X = (X;, 
t <€ [0, 1]) with C as its covariance function. Obviously, the given 
function C is Bio 4) X Bioipmeasurable. Denote by H(X) the 
closure in L? = L2(Q, F, P) of the hnear space generated by the 
r.v.s {X;, t € [0, 1]$; H(X) 1s called a linear space of the process_X. 
According to Cambanis (1975) the following two statements are 
equivalent: (a) the process _X has a measurable modification; (b) the 
covariance function C is By < ®By-measurable and H(X) is a 
separable space. 

Now, since the values of X are orthogonal in L2, that is ELX, X;] 
= 0 for s # t, s, t € [0, 1], the space HX) is not separable and 
therefore the process_X does not have a measurable modification. 

The same conclusion can be derived in the following way. 
Suppose X has a measurable modification, say Y = (Y;, t € [0, 1]). 
Then 


1 1 
(1) E| / Y2dt} = f cona= 
J0 JO 


and this relation implies that ie Y¥ dt: = 06.48. Letignun > 1} 
be a complete orthogonal system in_ the — space 
L7(0, 1) = L*({0, 1), Bio), Leb) of all functions f(4), ¢ € [0, 1] 
which are Bro, | ]-measurableandsquare-integrable: 
i f?(t) dt < oo, Then (see Loéve 1978) 


— S- EniPn{t) 


m—1 


in L? [0, 1] where =, E Y;n(t)dt a.s. Further, we have 


é2) =f [ce (s)¢en(t) ds dt = 


that is P/£,, = 0| = 1, and hence 


1 CxO | 
/ ig >. é7=0 as. 
J0 


n=l 


which contradicts equality (1). Therefore the process X with the 
covariance function C does not have a measurable modification. 


(iii) Here we suggest a brief analysis of the usual and the 
progressive measurability of stochastic processes. 

Let X = (X;, t= 0) be an (F,)-progressive process. Obviously X is 
(F,)-adapted and measurable. Is it then true that every (¥;)-adapted 


and measurable process is progressive? The following example 
shows that this is not the case. 


Let Q=R*, ¥ =B" and P(dx) =e ~ dx where dx corresponds to 


the Lebesgue measure. Define 4 = {(x, x), x € R‘} and let F, for 


each t € R™ be the o-field generated by the points of R™ (this means 
that A € F, iff A or A is countable). Consider the process X = (Xj, t 


€ R*) where 


X;(w) _ La(t, Ww). 


Then the process X is (¥,)-adapted and B* x ¥-measurable but is 
not progressively measurable. 


It is useful to cite the following result: if X is an adapted and 
right-continuous stochastic process on the probability basis (Q, F, 
(F,), 1 = 0, P) and takes values in the metric space (F, €), then X is 
progressively measurable. The proof of this result as well as a 
detailed presentation of many _ other results concerning 
measurability properties of stochastic processes can be found in the 
books by Doob (1953, 1984), Dellacherie and Meyer (1978, 1982), 
Dudley (1972), Elliott (1982), Rogers and Williams (1994) and 
Rao (1995). 


19.6. On the stochastic continuity and the weak LL. 
continuity of stochastic processes 


Let XY = (X(0), t € T) be a stochastic process where T 1s an interval 
in R!. We say that_X is stochastically continuous (P-continuous) at 
a fixed point fg € Tif for ¢ € 7, X(t) cpg 7 as t — to. The 
process X is said to be stochastically continuous if it 1s P- 
continuous at all points of 7. 

A second-order process X = (X(t), t¢ TC Rr!) is called weakly 
L!-continuous if for every t € T and every tr.v. € with E[*] < cc 
we have 


lim E[X (s)é] = ELX (#)6] 


We now consider two specific examples. The first one shows 
that not every process is stochastically continuous, while the 
second examines the relationship between the two notions 
discussed above. 


(i) Let the process _X = (X(t), t € [0, 1]) consist of 1.1.d. r.v.s with a 
common density g(x), x € ri. Let to, ¢€ [0, 1], t# to and «> 0. 
Then 


De:=P||X(t) — X(to)| > € I= ff 9@ g(y) dx dy. 


|ju—y|e 


Obviously, if - — 0 then 


D- | [96 r)g(y) andy =f if 0 )g(y) da dy = 1. 


LFY 


This means that for some <9 > 0, p.,, > oT and hence 
P[|X(t) — X(to)| > eo] #0 as t— to. 


Therefore the process_X is not stochastically continuous at each ¢ ¢€ 
LO, 1]. 


(ii) Let the probability space (Q, +, P) be defined by Q = [0, 1], ¥ 
= Bro, 1] with P the Lebesgue measure. Take the sequence of r.v.s 
{m,n > 1} where n,(w) = n°/41191/n](w). Then for sufficiently 
small « > 0 we have 


Plu: lnn(w)| > €] = — +0 
n 
and hence y,, —+0 as n — co. However, E[nz] = n'/?, the 
sequence {1,, } is not bounded and consequently {7,, } is not weakly 
L| convergent (see Masry and Cambanis 1973). 

Our plan now is to use the sequence {7,,} to construct a 
stochastic process which is stochastically but not weakly LI. 
continuous. 

Define the process _X = (X(#), t € [0, 1]) by X(t) = 0 for t= 0, w € 
Qand 


X(t) = (n+ 1)(1 — nt)nngi(w) + n((n + 1)t — 1) mw) 


for te |ah.t| and ow € Q@, n => 1. Thus 


AO} =O) OL fa) = ney a eel, ond ia alla € OQ, X (., ) is a 
linear function on every interval [——, +], n > 1. Since X(+) = np, 


and the sequence {7,,} does not converge weakly in + ee then 


m+l?on 


the process_X is not weakly L!-continuous at the point t= 0. 

It is easy to see that for all w € Q the process_X 1s continuous on 
(O, 1] and hence_X is stochastically continuous on the same interval 
(O, 1]. Clearly, it remains for us to show that_X is P-continuous at tf 
= 0. Fixe >0Oand 5 s 0. Since n,, —»> 0, there exists N = N(e,d) 
such that for all n > N,P([\n,| ><] < 36. Now for all 
t € (0, N—'] we have t € Faw 1] for some concrete n > N the 


n+1°? 
definition of X that 


0< |X(t)| < max{|nn|,|Nn41|} forall w € O. 


Thus 
Pw: |X(é)| > €] < Pw: max{|mn|, [Mn+il} 2 €] 
< Plu: so > €] + Plu: |n+il = €| 
cla lsne 
implying that 


X(t) 30 = X(0) as +0 


and hence the process_X 1s stochastically continuous on the interval 
[O, 1]. 
Note that in this example the weak L continuity of X is violated 


at the point ¢ = 0 only. Using arguments from the paper by Masry 
and Cambanis (1973) we can construct a process which 1s 


stochastically continuous and weakly L| discontinuous at a finite 


or even at a countable number of points in [0, 1]. 


19.7. Processes which are stochastically continuous but 
not continuous almost surely 


We know that convergence in probability 1s weaker than a.s. 
convergence (see Section 14). So it is not surprising that there are 
processes which are stochastically but not a.s. continuous. 
Consider the following two examples. 


(i) Let X¥ = (X% t € [0, 1]) be a stochastic process defined on the 
probability space (Q, F, P) where (2 = [0, 1], F = Bro, 1], P is the 
Lebesgue measure and 


[ty af fw 
a +0 if t<w. 


The state space of X consists of the values | and 0, and the finite- 
dimensional distributions are expressed as follows: 


PRG = Oise 


P| X;, =-{) ee Xe. — 0] = | —e PiAy = = bag ck. 


— 
— 
ee, 
—_ 


In all other cases P[X; = ky, ... , X¢ = ky] = 0 where fj = 0 or 1. 


Clearly, the process _X is stochastically continuous since for any ¢ € 
(O, 1) we have 


P| 


Xt5 — Xt, | = e| = PX, = 0, X¢, = 1] = 15 —' hj. 
However, almost all trajectories of X are discontinuous functions. 


(ii) Consider the Poisson process X = (X;, t = 0) with a given 


parameter A. That is, X9 = O a.s., the increments of X are 
independent, and 7 — X, for t > s has a Poisson distribution: 
P[X; — X, = bk) = e489) [XE — 8)]*/hl, & = 0,1,2,.... The 
definition \ seaatiaeie implies that for each fixed fg > 0 


a 
X~—> Ai, as f > tp. 


Hence _X is stochastically continuous. However, it can he shown 
that every trajectory of X is a non-decreasing stepwise function 
with jumps of size | only. This and other results can be found e.g. 
in the book by Wentzell (1981). 

Therefore the Poisson process is stochastically continuous but 
not a.s. continuous. 


19.8. Almost sure continuity of stochastic processes and 
the Kolmogorov condition 


Let X = (X;,t € T Cc R*) be a real-valued stochastic process 
defined on some “robability space (Q, F, P). Suppose X satisfies 
the following classical Kolmogorov condition: 


(1) E||X, — X,|?] < Kit —s|*t?, K = constant, p> 0,¢>0, t,s ET. 
| f 


Then X is a.s. continuous. In other words, almost all of the 
trajectories of X are continuous functions. The same result can be 
expressed in another form: if condition (1) 1s valid a process 
X = (X;,t € T) exists which is a.s. continuous and is a 
modification of the process X. 

Since (1) is a sufficient condition for the continuity of a 
stochastic process, it is natural to ask whether this condition 1s 
necessary. 

Firstly, 1f we consider the Poisson process (see Example 19.7) 
again we can easily see that condition (1) is not satisfied. Of 
course, we cannot make further conclusions from this fact. 


However, we know by other arguments that the Poisson process 1s 
not continuous. 

Consider now the standard Wiener process w = (uy,,t > 0). 
Recall that wg = 0 a.s., the increments of w are independent, and 
we — Ws ~ N(O,|t — s|). It is easy to check that @ satisfies (1) with 
p =4 and qg = 1. Hence the Wiener process @ 1s a.s. continuous. 

Now based on the Wiener process we can construct an example 
of a process which is continuous but does not satisfy condition (1). 
Let Y = (Y%, t = 0) where Y; = exp(w;?). This process is ass. 
continuous. However, for any p > 0 the expectation E||Y; — Y,|?! 
does not exist and thus condition (1) cannot be satisfied. This 
example shows that the Kolmogorov condition (1) is not generally 
necessary for the continuity of a stochastic processes. 

Important general results concerning continuity properties of 
stochastic processes, including some useful counterexamples, are 
given by Ibragimov (1983) and Balasanov and Zhurbenko (1985). 


19.9. Does the Riemann or Lebesgue integrability of the 
covariance function ensure the existence of the 
integral of a stochastic process? 


Suppose X = (X;, t € [a, b © Ri) is a second-order real-valued 
stochastic process with zero mean and covariance function 
Piety = EA, X,|; #4 © lab), We should like to analyse the 
conditions under which the integral J = [ X;,dt can be 
constructed. As usual, we consider integral sums of the type 
In = yee. (ty, —tk-1), Sk © (te-1,ty) and define J as the 
limit of {Jy} in a definite sense. One reasonable approach is to 


consider the convergence of the sequence {Jay} in L2- sense. In 


this case, if the limit exists, it is called an L2-integral and 1s 
denoted as aye X, dt. 
According to results which are generally accepted as classical 


(see Levy 1965; Loéve 1978), the integral (L?) X, dt exists iff 
the Riemann integral (R)/ ° fj ° ['(s,t) dsdt exists. Note however 


that a paper by Wang (1982) provides some explanations of certain 
differences in the interpretation of the double Riemann integral. As 
an important consequence Wang has shown that the existence of 
(R)f” iM [I'(s, t) ds dt 1s a sufficient but not necessary condition for 


the existence of (L*) i X, dt. Let us consider this situation in more 
detail. 

Starting with the points a = x9 <x] <...<X,, = 6 on the axis O, 
and a = yo <y] <... <n = 6 on the axis Oj, we divide the square 
[a, b] x [a, b] into rectangles in the standard way. Define 


Ai = ree Az;, Atj=2,-—Ti-1, A= gen Ay;, Ay; = yj — yj-1- 


<1 mM 


Introduce the following two conditions: 


il it 


I lim ys TP(u;,v;)Ar;Ay; exists: 
(1) poe dE (uis rs )AriAy; ¢ 


A, 70,424 : 
=19j=1 
mn rt 
(2) lim > > C(uz;, vj; Av Ay; exists. 
. A, 70,4230 al i pay, 
i=1 4=1 
(Here tl; € (2.9, 2}, Uy E (Y5—15 Uy) and (ui;, Vij c (25-1, 2;) ¥ (y;-1; yj) 


It is important to note that the Riemann integral 
(R)f? [? P(x, y) dady exists iff condition (2) is fulfilled (not 
condition (1)). On the other hand, the integral (L2) [” X, dt existsiff 
condition (1) is fulfilled. Since (2) = (1), then obviously the 
existence of (R) i I(x, y)dxdy 1s a sufficient condition for 
theexistence of (L*) {" X;, dt. 

Following Wang (1982) we describe a stochastic process X = 
(X;, t € [0, 1]) such that its covariance function I(s, ¢) 1s not 
Riemann-integrable but the integral (L?) im X;dt does exist. In 
particular, this example will show that in general (1) # (2) , that 1s 


conditions (1) and (2) are not equivalent. 

For the construction of the process _X with the desired properties 
we need some notation and statements. Suppose that [a, b] = [0, 1]. 
Let 


and j(2*) = 2* (mod j), 1 < j(2*) < 2*. Clearly, if 7 is odd, then j(2‘) 
is odd. 
For x € B,x= (22k 5 pla2ktl we define the function 


ak+1 gktl 


g(x) = 5(2*)/2* 4 5/227 +" = fo?" 5(2") + 5)/2? 


Let us now formulate four statements numbered (I), (II), CI) 
and (IV). (For detailed proofs see Wang (1982).) 


(1) Ifx1, x2 € Band x1 #X9, then g(x1) # g(x). 
Now let 


B’={e:2 € B, g(x) ¢ B,x < g(x)}, D={(2,y): 2 € B’,y = g(x), (2, y) € A}. 


(II) We have D C 4 and for arbitrary 0 > 0 and (xg, yg) € A there 
exists (x, y) € D such that d[(xo, vo), (x, v)] < 0. (Here d[., .] 1s the 


usual Euclidean distance in the plane.) 
Introduce the set 


BY = {y: y= 9(z),2 € BY} N (4,1). 


Then B” B’ = 9. In the square [5,1] x [4,1] we define the function 
v(x, y) as follows. 

1) If, y) © A= (, y): £<x<y< 1} we put y(x, y) = 1, if @, 
y) € Dand y(x, y) = 0, senior 

2)IfLyx=yy 1, let yx, y) = 1, ifx =y eB’ U B" and y(x, y) = 


0, otherwise. 
3) If £<y <x <1 we take 10s y) = y(y, x). For the boundary 


eS es ea ee 2 ts let y(x, y) = 0. 
(III) The einen cnieeal (R) fi fa , y(a, y) dx dy does not exist. 


(IV) lima,-+0,A.-+0 >.;— ia : nil. v;)Ax;Ay; exists and is 
Zero. 

So, having statements (I) and (II) we can now define a stochastic 
process whose covariance function equals y(s, ¢). For ¢ in the 
interval [+, 1] and ¢ < B’, let €, be ar.v. distributed Ny(0, 1). Ift< B" 


then there exists a unique s € B’ such that ¢ = g(s); let 
€& = €,.Ift ¢ BYU B”, let €; = 0. Then it is not difficult to find 
that 


[(s, t) _— ElE,&| = ¥(s, t) 


where y(s, f) 1s exactly the function introduced above. 
It remains for us to apply ao (III) and (IV). Obviously, 
(IV) ~— the ere of (L7) dh &, dt. However, (III) shows 


+ 5 é. 3 


Therefore the aie ati of the covariance function 
is not necessary for the existence of the integral of the stochastic 
process. 

As we have seen, the existence of the integral (L?) i X, dt 1s 


related to the Riemann integrability of the covariance function of 
X. Thus we arrive at the question: 1s it possible to weaken this 
condition replacing it by the Lebesgue integrability? The next 
example gives the answer. 

Define the process X = (X;, t € [0, 1]) as follows: 


xy,-2 QO, if ¢ is irrational 
_ nH, if tis rational 


where 7 is ar.v. distributed Ny(0, 1). 

It is easy to see that I(s, t) = ELX, X;| = 1 if both s and ¢ are 
rational, and I’(s, ¢) = 0 otherwise. Since I'(s, f) # 0 over a set of 
plane Lebesgue measure zero, then I(s, t) 1s Lebesgue-integrable 
on the square [0, 1] < [0, 1] and ie i ['(s,t)dsdt = 0. However, 
the function I(s, t) does not satisfy condition (1) which 1s 


necessary and sufficient for the existence of (L4) f° X, dt. Hence 


0 
the integral (L*) [ X, dt. does not exist. | 
Therefore the Lebesgue integrabihty of the covariance function 
I’ of the process X is not sufficient to ensure the existence of the 
integral (L*) |, X; dt. 


19.10. The continuity of a stochastic process does not 
imply the continuity of its own generated 
filtration, and vice versa 


Let X = (X% t = 0) be a stochastic process defined on the 
probability space (Q, F, P). Denote by co = O14, 8! = £} the 
smallest o-field generated by the process_X up to time f¢. Clearly, 
a e Te ift < t,. The family oe ,t > 0) is called the own 
generated filtration of the process. 

It is of general interest to clarify the relationship between the 
continuity of the process_X and the continuity of the filtration (Fey. 
Recall the following well known result (see Liptser and Shiryaev 
1977/78): 1f X = w is the standard Wiener process, then the 
filtration (F,°,t > 0) is continuous. 

Let us answer two questions, (a) Does the continuity of the 
process X imply that the filtration Ge): is continuous? (b) Is it 
possible to have a continuous filtration GY, which 1s generated by 
a discontinuous process X’ 


(i) Let Q = RI, ¢ =p! and P be an arbitrary probability measure 


on B!. Consider the process X = (X;, t = 0) where X; = X(@) = 
t€(w) and ¢ is ar.v. distributed }V(0, 1). Obviously, the process X is 
continuous. Further, it is easy to see that fort = 0, F> is the trivial 
o-field {9, Q}. If t > 0, we have ¢X = B!, Thus FY A Fe. 
Therefore the filtration (F;*) is not right-continuous and hence not 
continuous despite the continuity of X. 


(ii) Let Q = [0, 1], F = Bro, 1] and P be the Lebesgue measure. 
Choose the function heCc™(R) SO that 
h(x) = Oforx < 4 andforz > $,h(x)>0, and h is strictly 
increasing on the interval [4, 00). (It is easy to find examples of 
such functions.) Consider the process X = (X;, t = 0) where X; = 
X(@) = oh(t), @ € Q, t= 0 and let (F;“,¢ > 0) be its own generated 
filtration: a = o{X,,s < t}. Then it 1s easy to check that 


gx _ {O28}, foe 
ee bse ift > 4. 


bh | 


Hence the filtration Cay ) 18 discontinuous even though the 


trajectories of X are in the space Cc”. 


(iii) Now we aim to show that the filtration (a7) of the process _X 
can be continuous even if_X has discontinuous trajectories. 

Firstly, let ; - R*+ .5 R! be any function. Then a countable 
dense set D € R™ exists such that for all ¢ > 0 there is a sequence 
{t,, n = 1} in D with t, — ¢ and A(t,) — A(t) as n — o. The 
reasoning is as follows. Let (Q, 4, P) be a one-point probability 
space. Define X(@) = h(t) for w € O and all ¢ = 0. Since the 
extended real line 71 is a compact, the separability theorem (see 
Doob 1953, 1984; Gihman and Skorohod (1974/1979); Ash and 
Gardner 1975) implies that (X;, t= 0) has a separable version (Y;, t 


> 0) with Y¥;: Q > pl, t= 0. But Y; =X; a.s. and so Y(@) = X(@) 
=h(t), t= 0. 

Thus we can construct a class of stochastic processes which are 
separable but whose trajectories need not possess any useful 
properties (for example, X can be discontinuous, and even non- 
measurable; of course, everything depends on the properties of the 
function h which, let us repeat, can be chosen arbitrarily). 

Now take again the above one-point probability space (Q, F, P) 
and choose any function j, - Rt -4 JR! Define the process X = (Xj, 
t> 0) by X% = Xo) = h(t), wo € Q, t = 0. Then for all ¢ => 0, 
Fs = o{X,,s < t} is a P-trivial o-field in the sense that each 
event A € F* has either P(A) = | or P(A) = 0. Therefore the 
filtration (Ge. t > 0) is continuous. By the above result the process 
X is separable but its trajectories are equal to h, and hf 1s chosen 
arbitrarily. It is enough to take / as discontinuous. 

Finally we can conclude that in general the continuity (and even 
the infinite smoothness) of a stochastic process does not imply the 
continuity of its own generated filtration (see case (1) and case (11)). 
On the other hand, a discontinuous process can generate a 
continuous filtration (case (111)). 

The interested reader can find several useful results concerning 
fine properties of stochastic processes and their filtrations in the 
books by Dellacherie and Meyer (1978, 1982), Metivier (1982), 
Jacod and Shiryaev (1987), Revuz and Yor (1991) and Rao (1995). 


SECTION 20. MARKOV PROCESSES 


We recall briefly only a few basic notions concerning Markov 
processes. Some definitions will be given in the examples 
considered below. In a few cases we refer the reader to the existing 
literature. 


Firstly, let XY = (X%% t € T Cr") be a family of r.v.s on the 


probability space (Q, ¥, P) such that for each , X; takes values in 
some countable set EF. We say that X is a Markov chain if it 
satisfies the Markov property: for arbitrary n > 1, t) <to <...<t,< 
t, tj, t€ 7, iy ed oe eae oe 


(1) PIX,=91X:, =t1,..-5- Xp, =in-1, Xt, = in] = PLX: = 51X:,, = ial. 


1 


The chain _X 1s finite or infinite accordingly as the state space EF 1s 
finite or infinite. If 7= {0, 1, 2,...! we write X = (X,, n = 0) or X = 
(X,,n = 0, 1, ...) and say that_X is a discrete-time Markov chain. If 
T=R" or T=[a, b] CR” we say that XY = (X;,,¢=>0) or X= (X%}, te 
[a, b]) is a continuous-time Markov chain. 

The probabilistic characteristics of any Markov chain can be 
found if we know the initial distribution (7;,j € E) where rj = PLXo 
= jl, 77 2 9, y jee’ = 1 and the transition probabilities on ft) = 
PLX; =/|X5 =i], t= 5, i, 7 ¢ E. The chain X 1s called homogeneous it 
pij(s, t) depends on s and ¢ only through ¢ — s. In this case, if X is a 
discrete-time Markov chain, it is enough to know (77,7 € £) and the 
1-Step transition matrix P = (pj) where py = P[Xn+1 = j|Xy = i], 0 
> (0. The n-step transition probabilities form the matrix 
p(n) = ( Dy) and satisfy the relation 


(2) or =>" py (m), je 


Keb 


which is called the Chapman-Kolmogorovy equation. 
Note that the transition probabilities pj (1) or pj (s , t) of any 


continuous-time Markov chain satisfy the so-called forward and 
backward Kolmogorov equations. 

In some of the examples below we assume that the reader 1s 
familiar with basic notions and results in the theory of Markov 
chains such as classification of the states, recurrence and transience 


properties, irreducibility, aperiodicity, infinitesimal matrix and 
Kolmogorov equations. 
Now let us recall some more general notions. Let _XY = (X;= 0) be 


a real-valued process on the probability space (Q, F, P) and (F,, t 
> 0) be its own generated filtration. We say that _X 1s a Markov 
process with state space (rt, B) if it satisfies the Markov property: 
for arbitrary I’ € p! and t> S; 


(3) PLX, € T|F,] = PLX; € TX] as. 


This property can also be written in other equivalent forms. 

The function P(s, x: t, IT) defined for 
s.t € R*.s < t.x7 € R'.T © B' Is said to be a transition 
function if: (a) for fixed s, t and x, P(s, x; t, -) 1s a probability 
measure on pl: (b) P(s, x; t, T) 1s 7 -measurable in x for fixed s, ¢, 
I; (c) P(s, x; s, T) = 0,7) where 0,(1) is the unit measure 


concentrated at x; (d) the following relation holds: 
Pact) = | Place ump Ply tl) es ak. 


This relation, called the Chapman-Kolmogorov equation, 1s the 
continuous analogue of (2). 
We say that X = (X; t = 0) is a Markov process with transition 


function P(s, x; t, I’) if_X satisfies (3) and 
PLX,, € U1X,] = P(s, X53 t, VT) as. 


The Markov process X is called homogeneous if its transition 
function P(s, x; t, 1) depends on s and ¢ only through ¢ — s. In this 
case we can introduce the function P(t, x, I) = P(O, x; t 1) of three 


arguments, t>x € rl, Te 8! and to express conditions (a)—(d) in a 
simpler form. 


Note that the strong Markov property will be introduced and 
compared with the usual Markov property (3) in one of the 
examples. 

Complete presentations of the theory of Markov chains in 
discrete and continuous time can be found in the books by Doob 
(1953), Chung (1960), Gihman and Skorohod (1974/1979), 
Isaacson and Madsen (1976) and Iosifescu (1980). Some important 
books are devoted to the general theory of Markov processes: see 
Dynkin (1961, 1965), Blumenthal and Getoor (1968), Rosenblatt 
(1971, 1974), Wentzell (1981), Chung (1982), Ethier and Kurtz 
(1986), Bhattacharya and Waymire (1990) and Rogers and 
Wilhams (1994). 

In this section we have included examples which examine the 
relationships between some similar notions or illustrate some of 
the basic properties of the Markov chains and processes. Note 
especially that many other useful results and counterexamples can 
be found in the recent publications indicated in the Supplementary 
Remarks. 


20.1. Non-Markov random sequences whose transition 
functions satisfy the Chapman-Kolmogorov 
equation 


Here we consider a few examples to illustrate the difference 
between the Markov property, which defines a Markov process, 
and the Chapman-Kolmogorov equation, which is a consequence 
of the Markov property. 


(i) Suppose an urn contains four balls numbered 1, 2, 3, 4. 
Randomly we choose one ball, note its number and return it to the 
urn. This procedure is repeated many times. Denote by ¢,, the 
number on the mth chosen ball. For 7 = 1, 2, 3 introduce the events 
Ay _ {either by a j or E, = 4} and let X3(m-1)+j — 1 if A” 
occurs, and 0 otherwise, m = 1, 2,.... Thus we have defined the 


random sequence (X,, = 1) and want to establish whether it 
satisfies the Markov property and the Chapman-Kolmogorov 


equation. 
If each of ky, k, k3 1s 1 or O, then 
BA S| SP X= hs hag = hs | s, n>m. 


Therefore for /< m<n we have 


bho | — 


= P[X,, = ke|X1 = hi] 
PAY = Fol Ay HOP Ae =O = Fl 
” PX, = k2|Xm = L|P[Xm = 1|X; = ky] = 3 5 SF 2 ; 


bole 
boa] 
‘ 


This means that the transition probabilities of the sequence {X,,, n 
> 1! satisfy the Chapman-Kolmogorov equation. Further, the event 
[X37 = 1, X37, = 1] means that ¢,, = 4 which implies that X3,, = 1. 
Thus 


P[X3m = 1|X3m-2 = 1, A3m-1 = L| =1, m= 1, 2, ea Da 


This relation shows that the Markov property does not hold for the 
sequence {X,,n => 1}. Therefore {X,, n = 1} 1s not a Markov chain 
despite the fact that its transition probabilities satisfy the 
Chapman-Kolmogorov equation. 


(ii) In Example 7.1(111) we constructed an infinite sequence of 
pairwise 1.1.d r.v.s {X, n = 1} where X, takes the values 1, 2, 3 
with probability $ each. Thus we have a random sequence such 
that pij = PLXn41 = j|Xn = 7] = = for all possible i, j. The 
Chapman-Kolmogorov equation is trivially satisfied with 
pa? = 4,n>1. However, the sequence {X,, n 2 I} is not 
Markovian. To see this, suppose that at time n = | we have X] = 2. 
Then a transition to state 3 at the next step is possible iff the initial 


state was |. Hence the transitions following the first step depend 
not only on the present state but also on the initial state. This 
means that the Markov property is violated although the Chapman- 
Kolmogorov equation is satisfied. 


(iii) Every N x WN stochastic matrix P defines the transition 
probabilities of a Markov process with discrete time. Its n-step 
transition probabilities satisfy the Chapman-Kolmogorov equation 


which can be written as the semigroup relation p”*” = p™p ”. Now 
we are going to show that for NV > 3 there 1s a non-Markov process 
with N states whose transition probabilities satisfy the same 
equation. 

Let Q] be the sample space whose points (x1), ee x()) are 
the random permutations of (1,..., MN) each with probability 1/M!. 
Let i and v be fixed numbers of the set {1,..., NM} and 5 be the 
set of the N points (x), pews x(N)) such that x) = vy. Each point in 
QQ has probability 1/N. Let Q be the mixture of Q] and Q») with 
QQ carrying weight | — 1/N and Q> weight 1/N. More formally, 


contains N!I+N — arrangements (x), oe =e s x(N)) 
whichrepresenteitherapermutation of (1, ..., N) or the A-fold 
repetition of the integer v, v= 1,..., .N. To each point of the first 


class we attribute probability (1 — N- lint: to each point of the 
second class, probability NV 2 Then clearly 


Pic? =v) =NU, Pie =y2c% =p) =N-?, iF}. 


Thus all transition probabilities of the sequence constructed above 
are the same, namely 


Piz = vie = pp] = N7!. 


If xC) = l, x2) = 1, then P[x@) + 1] = 0 which means that the 


Markov property is not satisfied. Nevertheless the Chapman- 
Kolmogorov equation is satisfied. 


20.2. Non-Markov processes which are functions of 

Markov processes 
If X = (X%} t = 0) is a Markov process with state space (F, €) and g 
is a one-one mapping of F into F, then Y = (g(X;), t = 0) 1s again a 
Markov process. However, if g is not a one-one function, the 


Markov property may not hold. Let us illustrate this possibility by 
a few examples. 


(i) Let {X,, 1 =0, 1, 2,...} be a Markov chain with state space F 
= {], 2, 3}, transition matrix 


0 > 3 

__ | | : 

P=(3 4 2 

a wake 

3 4 12 
and initial distribution r = (3, z 5). It is easy to see that the 

chain {X;,} 1s stationary. 

Consider now the new process {Y,,n =0, 1, 2,...} where Y, = 


g(X,,) and g is a given function on £. Suppose the states 7 of X on 


which g equals some fixed constant are collapsed into a single state 
of the new process Y called an aggregated process. The collection 
of states on which g takes the value x will be called the set of states 
S. Itis obvious that only non-empty sets of states are of interest. 


For the Markov chain given above let us collapse the set of 
states S consisting of 1, 2 into one state. Then it is not difficult to 
find that 


- as eee -~ctw - a1 _. 29 mir -oly n\2 _ (132 
PX m+2 e 5, Am+1 = SLX ‘Tl = S| — oq6 = (Pix m+ | Lt 5 Xx i & S]) — (33) . 


| 
a 


This relation implies that the new process Y is not Markov. 


(ii) Let {X,, n = 0, 1, ...$ be a stationary Markov chain with the 
followingstate space F = {1, 2, 3, 4} and n-step transition matrices 


Ld, ea ‘O 1-10 
|e ae 0 0 OO 
(n) _ 1 n 
. clare. |” foe 10 
|e es 0 8 Ge 
| 1-1 1-1 
; NV" 1 0 0 0 0 
\/2 /2 0 0 0 O 
—-1 1-1 1] 
where n = 1, 2,...and A, A’ are real numbers sufficiently small in 


absolute value. Take the function g: E> {1, 2} such that g(1) = 
2(2) = 1, g(3) = g(4) = 2, and consider the aggregated process {Y,, 


n=0, 1, .. .} where Y, = g(X,). If OM”) denotes the n-step 
transition matrix of Y, we find 
J+a0( 4-4). 


Q”) = ( 


It turns out that Oo”) does not depend on A’ and it is easy to check 


that ol”) > 1, satisfy the Chapman-Kolmogorov equation. 
However, the relation 


ee ed 
ble Bole 
Bole! Boe 
bole bole 


P[Yo =1,¥%1 =1, ¥2 = 1 = $(1 +244) 
implies that the sequence { Y,, n = 0} is not Markov when A FX’. 


(iii) Consider two Markov chains, X] and X, with the same state 


space F and initial distribution 7, and with transition matrices P} 
and P> respectively. Define a new process, say X, with the same 
state space &, initial distribution 7 and n-step transition matrix 
pi) = 1 pi” +i Pk”, Then it can be shown that the process_X, 
which is called a mixture of X7 and_X9, is not Markov. 


(iv) Let @ = (q@, t, = 0) be a standard Wiener process. Consider the 
processes 


= (jt =o), AH i, = max w,t > 0), Y=M =m. 
O<s<t 


Then obviously the process M is not Markov. According to a result 
by Freedman (1971), see also Revuz and Yor (1991), Y is a 
Markov process distributed as |w| where |@| is called a Wiener 
process with a reflecting barrier at 0. Since the Wiener process w 
itself is a Markov process, we have the relation 


M=Yroa. 


Here the right-hand side is a sum of two Markov processes but the 
left-hand side 1s a process which 1s not Markov. In other words, the 
sum of two Markov processes need not be a Markov process. Note, 
however, that the sum of two independent Markov processes 
preserves this property. 


20.3. Comparison of three kinds of ergodicity of Markov 
chains 
Let X= {X,,n =0, 1,...} be a non-stationary Markov chain with 


state space E (E is a countable set, finite or infinite). The chain_X 1s 
described completely by the initial distribution 
fi) i= (f, ou ,j © F)and the sequence (Py, n= 1} of the transition 


matrices. 


(kKjk+n) __ a, exists for all j ¢ E independently of i, 


If liMp—+oo Pi; 
; > 0 and 7; = 1, we say that the chain X is ergodic and 
pe ,] € E) 1s its ergodic distribution. 


Introduce the following notation: 
FO) = fO PLP Pmy  fE™ = FO) Per Peta +++ Pm: 
Pm) = Py Paro... Pr. 
The Markov chain_X 1s called weakly ergodic 1f for all k €N, 


(1) lim sup | fem - g®™)|| — 0) 


Tia Oo ‘ : 
fs g™ 


where f = (f7",j7 € E) and g® = (gj € E) are 
arbitrary initial distributions of X. | 

The chain_X 1s called strongly ergodic if there is a probability 
distribution q = (g7, 7 € £) such that for all & € N, 


(2) lim sup || f%"™ — q|| = 0. 
—? 00 FO 


(In (1) and (2) the norm of the vector x = (x;, 7 € £) is defined by 
xl] = ujer [ey 

Now we can easily make a distinction between the ergodicity, 
the weak ergodicity and the strong ergodicity in the case of 
stationary Markov chains. 

For every Markov chain we can introduce the so-called 0- 
coefficient. If P = (p;7) is the transition matrix of the chain we put 


6(P) =|- int y. min{ pi;, Pr; }- 


ee Coe ah 


This coefficient is effectively used for studying Markov chains and 


will be used in the examples below. 

Our aim is to compare the three notions of ergodicity introduced 
above. Obviously, strong ergodicity implies weak ergodicity. Thus 
the first question is whether the converse is true. According to a 
result by Isaacson and Madsen (1976), if the state space E is finite, 
the ergodicity and the weak ergodicity are equivalent notions. The 
second question 1s: what happens if £ 1s infinite? The examples 
below will answer these and other related questions. 


(i) Let {X,,} be a non-stationary Markov chain with 


z+ 3 0 | 
P5,-1 — { ) Pg ( | 9 , ie Lee ews ‘ 


We can easil see that 6(P2,,) = 1 and 6(Po,-1) = 5. Hence for all 
k, 


=, 


rr 
§(Ppm) < I] 6(P;) < (1/2)h"-)7l 0 as m0. 
j=k+l 


However, the condition 6(Ps m)) — 0 for all k asm > ow 1s 
necessary and sufficient for any Markov chain X to be weakly 
ergodic (see Isaacson and Madsen 1976). Therefore the Markov 
chain considered here is weakly ergodic. Let us determine whether 
X is strongly ergodic. Take f®) = (0, 1) as an initial distribution. 
Then 


fer) = f(P, Po) (P3P,) ...(Por—1 Pax) 


= on ( = (0, 1) 


— oe foe 


= ble 


and 


FOE = FONE Ps GPs eave Papadoe ae 


1 a\r 
=o.n( | | = (0,1). 


Hence if) — fairy = 2 forj=k,k+1,... and the sequence 
{f+ does not converge. Therefore the chain X is not strongly 
ergodic. 


(ii) Again, let {X,,} be a non-stationary Markov chain with 


— a - a. qt 
nm = Jn— | 2n— | rm _ in on _ F 
Poi =( | 1 i fo, = 1 1 % nmn=4].37..... 
ame err ae In ~ on 


} i TY a pec 
pobom) = (l1-=, ) , if mis odd 
(—,1— —), if mis even. 


It 1s not difficult to check that condition (1) 1s satisfied while 
condition (2) is violated. Therefore this Markov chain is weakly, 
but not strongly, ergodic. 


(iii) Let {X,,} be a stationary Markov chain with infinite state space 


E and transition matrix 


fs 53 0 0 0 
+ 0 ¥ 0 0 
P=|0 $= 0 |; 0 
00 20 4 


»- &« © # * © 2 8 2 8 & # F F F FF FF HF 8 FF 


It can be shown that this chain is irreducible, positive recurrent and 
aperiodic. Hence it 1s ergodic (see Doob 1953; Chung 1960), that 
is independently of the imitial distribution fr), 
limy— oo {0 P\™ = xm exists and a = (m;, 7 € E) 1s a probability 
distribution. However, 6(P(”)) = 1 for all m which implies that the 
chain is not weakly ergodic. 


(iv) Since the condition 6(Ps m)) — 0 for all k asm —> ow 1s 
necessary and sufficient for the weak ergodicity of non-stationary 
Markov chains, it 1s natural to ask how this condition is expressed 
in the stationary case. 

Let {X,,} be a stationary Markov chain with transition matrix P. 


Then we have P(: ”) = pU"-*) and for the d-coefficient we find 
6(P&™) = 6(P\™—*)) < 6(P)|"-*. 


This means that the condition d(P) < I 1s sufficient for the chain to 
be weakly ergodic. However, this condition 1s not necessary. 
Indeed, let 


Oo ok 
p= | 2 @ 2 
a 
3 5s 3 


The Markov chain with this P is irreducible, aperiodic and positive 
recurrent. Therefore (see Isaacson and Madsen 1976) the chain 1s 
weakly ergodic. At the same time 0(P) = 1. 


20.4. Convergence of functions of an ergodic Markov 
chain 


Let XY = (X,,, n = 0) be a Markov chain with countable state space E 


and n-step transition matrix (pi? Let ty = lias ps” be the 


ergodic distribution of X and g: Ew r! be a bounded and 
measurable function. We are interested in the conditions under 
which the following relation holds: 


(1) lim Elg(x = S50). 


Tt 00 
jek 


One of the possible answers, given by Holewijn and Hordiyk 
(1975), can be formulated as follows. Let X be an irreducible, 
positive recurrent and aperiodic Markov chain with values in the 
space £. Denote by (;, 7 € £) its ergodic distribution. Suppose that 
the function g is non-negative and is such that Date a msgla) < 00. 
Additionally, suppose that for some ig € F, PLXo = ig| = 1. Then 
relation (1) does hold. 

Our aim now is to show that the condition X¢ = ig is essential. In 
particular, this condition cannot be replaced by the assumption that 
X has some proper distribution over the whole space £ at the time 
n= 0. 

Consider the Markov chain _X = (X,, 1 = 0) which takes values in 


the set FE = {0, 1, 2, . . .} and has the following transition 
probabilities: 
Poj = q'p, gH UV_L Zee Dee. = 1, Tee 12.85%, pi; = 0, otherwise 


where 0<p<1,g=1-~p.A direct calculation shows that 


i: 05, sis : 
Dis = ¢ Ji ic 2a | on PSV lycasy thom dy 


(7) _» y 7 oe LS ee His 
Dae = ho Pies Sd Pi; = 0, otherwise. 


The chain _X 1s irreducible, aperiodic and positive recurrent and its 
ergodic distribution (7), j € £) 1s given by 


7; := lm PR ) = gin. 


Th OO 


Suppose now g is a function on F satisfying the following 
condition: 5> j=0 2 1\9(7)| < oo. Suppose also that (r;, 7 € E) is the 
initial distribution of the chain_X. Then 


E[g(Xn)] oo ae eae rij ) KI) 
= (F0 +... + tas) Ep palald) + Deo rasy9l) 


Clearly, E[g(X;,)] will converge to >)" 9 pq’ g(j) as n — 00 iff 


oe) 


dim, > Mn+j9 (7) =f 


j=0 
Now we can make our choice of g and (77,7 € £). Let 


0, if 7 =0 
H=7, f=012.. md 4S 6 ff j=1,2 


_ <5 es om 


Then ye. ~ 9 7 \9(J)| < co and obviously for all n > 0 we get 


(x) 


> Prat+gjg (j ) = O. 


j=0 


Hence a relation like (1) 1s not possible. 
Therefore in general we cannot replace the condition PLXo = io] 


= 1 by another one assuming that Xo is a non-degenerate r.v. with 
an arbitrary distribution over the whole state space E. 


20.5. A useful property of independent random variables 
which cannot be extended to stationary Markov 
chains 


It is well known that sequences of independent r.v.s obey several 
interesting properties (see Chung 1974; Stout 1974a; Petrov 1975). 
It turns out that the independence condition is essential for the 
validity of many of these properties. 

Let us formulate the following result: if {X,, n = I} is a 
sequence of 1.1.d. r.v.s. and EX] = « then lim sup,_..0(X,,/n) = © 
a.s. 

This result is a consequence of the Borel-Cantelli lemma (see 
Tanny 1974; O’Brien 1982). Our aim now its to clarify whether a 
similar result holds for a ‘weakly’ dependent random sequence. 
We shall treat the case of {X,,, n = 0} forming a stationary Markov 
chain. 

Let X = (X), n = 0) be a stationary Markov chain with state 
space EF = {1,2,...} and 


P[X, =k] =1/(k(k+1)), k=1,2,...,n=0,1,2,... 


PIX, = k+1|X,-1 = k] = k/(k+ 2), 
P(X, = I Xn—1 = ki = 2/(k a 2), I 


5 Bry et 5 hy ee 


It is easy to see that EX, = «© for each n. However, we have X,, < 
Xotn a.s. for all n which implies that P[lim sup,_,., (X;,/n) < 1] = 
1. Hence lim supn—co (X,/n) 1s not infinity as in the case of 


independent r.v.s. Using a result by O’Brien (1982) we conclude 
that lim supn—0 (X,,/n) = 0 as. 


Therefore we have constructed a stationary Markov chain such 
that for each n, EX, = 00 but with lim sup, (X,,/n) = 0 as. 


20.6. The partial coincidence of two continuous-time 
Markov chains does not imply that the chains are 
equivalent 
Let X¥ = (X% ¢t = 0) and v = (xy; ¢t = 0) be homogeneous 
continuoustime Markov chains with the same state space £, the 


same initial distribution and transition probabilities pj; (¢) and 
piz(t) respectively. If p;(t) = pi;(t), i 7 © E for infinitely many ¢, 
but not for all t > 0, we say that X and yx coincide partially. If 
moreover we have Pi) = pij(t), i, 7 € E for all ¢ > 0, then the 
processes X and xy are equivalent (stochastically equivalent) in the 
sense that each one is a modification of the other (see Example 
19.3). 

Firstly, let us note that the transition probabilities of any 
continuous-time Markov chain satisfy two systems of differential 
equations which are called Kolmogorov equations (see Chung 
1960; Gihman and Skorohod 1974/1979). These equations are 
written in terms of the corresponding infinitesimal matrix O = (qj) 
and under some natural conditions they uniquely define the 
transition probabilities p;(Z), t = 0. 

Let XY and x be Markov chains each taking values in the set {1, 
2, 3}. Suppose _X and y are defined by their infinitesimal matrices 
O = (qj) and Q = (qjj) respectively, where 


| 

2 

" 1 | 

Q=] 0-1 1 Q=| 4-1 } 
1 0-1 S pel 


Thus, knowing Q and ¢ and using the Kolmogorov equations, 
we can show that the transition probabilities p;(¢) and p;;(t) have 
the following explicit form: 


pii(t) = pro(t) = paa(t) = 3 + Ze °*/? cos(V3t/2), 
Pi2(t) = po3(t) = pai(t) = 5 + 2 y—St/2 cog (/3t/2 — 27/3), 
pi3(t) = poi(t) = pso(t) = = + Ze78"/? cos(V3t/2 + 27/3), 
Dii(t) = poo(t) = pag(t) = = + 2@73t/ ' 
py(th=%z—ge", ifi AJ, i,7 =1,2,3. 
(The details are left to the reader.) 
Obviously, pjj(t) = pi; (t) for every t = 4kr/V/3, k € N, but for 


all other t we have p;(t) # pi;(t). Therefore the processes X and x 
are not equivalent, though they partially coincide. 


20.7. Markov processes, Feller processes, strong Feller 
processes and relationships between them 

Let X= (X;, t2 5, Ps ,) be a Markov family: that is (Xj, 2s) 1s a 

Markov process with respect to the probability measure Py ,, Ps 

xX, =x] = 1 and P(s, x3 4,1), t= 5, X € Ri, Te pl is its transition 

function. As usual, B = BR!) and ¢ = c(R!) ee denote the set of 

all bounded and measurable functions on R!, and the set of all 


| 


bounded and continuous functions on R sired By the 


equality 


P™" g(x) = E,,x[9(Xt)] = | g(y)P(s, x; t, dy) 
R! 
we define on B a semigroup of operators {P*"}. Obviously we have 
the inclusion P*’ B € B, moreover, P®’ c ¢ B.A Markov process for 
which P*! c C is called a Feller process. This means that for each 


continuous and bounded function g, the function P(x) is 


continuous in x. In other words, 
[ g(y) P(s, 2;t, dy) > [ g{y)P(s,zo;t,dy) as 2-429, to € R! 
Jil JP! 


which is equivalent to the weak continuity of P(-) with respect to 
the second argument (the starting point of the process). 

Let us introduce another notion. If for each g € B&B the function 
P®" g (x) is continuous in x, the Markov process is called a strong 
Feller process. Clearly, the assumption for a process to be strong 
Feller 1s more restrictive than that for a process to be Feller. Thus, 
every strong Feller process 1s also a Feller process. However the 
converse 1s not always true. There are Markov processes which are 
not Feller processes, although the condition for a process to be 
Feller seems very natural and not too strong. 


(i) Let the family X = (X;, t2 5, Ps ,) describe the motion for t= s 
of a particle starting at time s from the position X, = x: 1f X, < 0, 
the particle moves to the left with unit rate; if X, > 0 it moves to 
the right with unit rate; if XY, = 0, the particle moves to the left or to 
the right with probability + for each of these two directions. 
Formally this can be expressed by: 


eae, = £ + ( — Ss), [> s| =] it fob 
Pio/Ap—e—(t-—3),t>s)=1, f <0 
P, 2[X: =t —$,t > 5] = Py 2[Xe = —(t — 8), t > 8] = 5. 


It is easy to see that X = (X; t = s, Ps,x) 1s a Markov family. 


Further, if g is a continuous and bounded function, we find 
exphcitly that 


x+(t—s)), i‘ 0 
P*" g(x) = & g(x — (t—)), fe 0 
sg(t—s)+5o(-(—8)), if x=0. 


Since P*“g(x) has a discontinuity at x = 0, it follows from this that 
X is not a Feller process even though it is a Markov process. 


(ii) It is easy to give an example of a process which is Feller but 
not strong Feller. Indeed, by the formula 


P(t,2,T) =Ip(2+vt), t>0,2¢R', fe B', v=constant > 0 


we define a transition function which corresponds to a 
homogeneous Markov process. Actually, this P describes a motion 
with constant velocity v. All that remains is to check that the 
process 1s Feller but is not strong Feller. 


20.8. Markov but not strong Markov processes 


In the introductory notes to this section we defined the Markov 
property of a stochastic process. For the examples below we need 
another property called the strong Markov property. For simplicity 
we consider the homogeneous case. 

Let X = (X}, t = 0) be a real-valued Markov process defined on 


the probability space (Q, ¥, P) and (F;, t = 0) be its own generated 
filtration which is assumed to satisfy the usual conditions. Let 7 be 
an (F,)-stopping time and 7 be the o-field of all events A € 


such that A M [7 <t] € F; for all t= 0. 
Suppose the Markov process_X is (¥;)-progressive, and let n be 
an 4 ~-measurable non-negative r.v. defined on the set [w : T(@) < 


00]. Then_X is said to be a strong Markov process if, for any I’ € p! 


PX ECS ,| — PA Cl X.| as: 


This relation defines the strong Markov property. In terms of the 
transition function P(¢, x, I) 1t can be written in the form 


(1) PiX,+4, € TF] = P(n,X-7,T) as. 


If (X; ¢ = 0, P,) 1s a homogeneous Markov family (also see 
Example 20.5), the strong Markov property can be expressed by 


(2) P,{AN [X-4, € T]} = | P(n, X,,T)P,(dw), 
Wf A 


ACFINiwitlw) < 06, Hiw) = cof. 

Two examples of processes which are Markov but not strong 
Markov are now given. Case (1) 1s the first ever known example of 
such a process, proposed by A. A. Yushkevich (see Dynkin and 
Yushkevich 1956). 


(1) Let w = (wt, t= 0, P,) be a Wiener process which can start from 


any point x « r!. Define a new process X = (X;, t= 0) by 


, uy, if wo 40 
Xy = 7 
0, if wo— 0. 


Then X is a homogeneous Markov process whose transition 
function 1s 


, Qnt)—\/? [ exp[—(u — 2)?/(2t)| du, if «40 
PL LS ( r 7 

(i, 2,1) eres if r=0. 
(Here 0¢(-) is a unit measure concentrated at the point 0.) 


Let us check the usual Markov property for xX. Clearly, P 
satisfies all the conditions for a transition function. We then need 
to establish the relation 


(3) P.{AN [Xen €T]} = / P(h, X;,T)P, (dw) 


ria 


for t, h>0, A © Fi and I € Bl. (Note that by equation (3) we 
express the Markov property of the family (X;¢ = 0, P,.), while the 
strong Markov property 1s given by (2).) If x # 0, then X; = w; as. 


and (3) is reduced to the Markov property of the Wiener process. If 
x = 0, (3) is trivial since both sides are simultaneously either | or 0. 
Hence_X is a Markov process. 

Let us take x #0, r= inf{t:X,=O0},n=(-1)V0,4={t< 1} 


and I’ = Ri\s0}. Then obviously t < 00 a.s., 1 < 0 a.s. Suppose _X 1s 
strong Markov. Then the following relation would hold (see (2)): 


(4) PAN |Acege TE = | Po. Xe. VP te). 


However, the left-hand side is equal to 
Pale = 1 Ag 0) — Pal eT) = 2 — Ola 0 


while the right-hand side is 0. Thus (4) is not valid and therefore 
the Markov process_X 1s not strong Markov. 


(ii) Let t be a r.v. distributed exponentially with parameter 1, that 
is P[t > ¢] =e”. Define the process X = (X;, t= 0) by 


X, = X;(w) = max{0,t — T(w)} 
and let J: = o {X,, s < tt, t= 0 be its own generated filtration. 

It is easy to see that if X;= a > 0 for some f¢, then X45 =ats 
for all s > 0. If for some t we have X; = 0, then X, must be zero for 
s <t, so it does not provide new information. Thus we conclude 
that_X is a Markov process. Denote its transition function by P(t, x, 
I’). Let us show that_X 1s not a strong Markov process. Indeed, the 
relation [@ : t(@) > t] =[@ : X(@) = 0] € Ft shows that the r.v. 7 is 
an (J1)-stopping time. For a given t we have P[X,4, = s] = 1. 
Therefore 


; l, ife>s 
t e <7 — tn 
(5) P[X,45 < 2|F,| i if e<s, 


If_X were strong Markov, then the following relation would hold: 
(6) Pi Xs =| = Pixies eA as! 


and the conditional probability on the right-hand side could be 
expressed according to (1) by the transition function P(t, x, I), 
namely 

Pixs 2A.) Sie ASL) | (where 3. oa, 


However, 


(7) P(s,X,,le) = P(s,0,0 x) = PIXies < 2/X; = 0] 


2 des il ef Ss 
7 | a--e).. ab @ ee, 


From (5) and (7) it follows that (6) 1s not satisfied. The process _X 
is therefore Markov but not strong Markov. 


20.9. Can a differential operator of order & > 2 be an 
infinitesimal operator of a Markov process? 


Let P(t, x, T), t= 0, x € Ri, T eB! be the transition function of a 
homogeneous Markov process X = (X;, t> 0) and {P%, t> 0} be the 
semigroup of operators associated with 
Pe Pe) = aul, 209). 


corresponding to {P*} (and also to P and to X) is denoted by A and 
is defined by: 


The infinitesimal operator 


Li | | 
(1) Au(x) = lim - | u(y) P(E, x2,dy) — u(x)} . 
R} 


‘6 4 


Let D(A) be the domain of A, that 1s D(A) contains all functions 
u(x), x € r! for which the limit in (1) exists in the sense of 


convergence in norm in the space where {P”} is considered. 
Several important results concerning the infinitesimal operators 
of Markov processes and related topics can be found in the book 
by Dynkin (1965). In particular, Dynkin proves that under natural 
conditions the infinitesimal operator A 1s a differential operator of 


first or second order. In the latter case D(A) = c2(r}), the space of 
twice continuously differentiable functions. So we come to the 
following question: can a differential operator of order k > 2 be an 
infinitesimal operator? 

Suppose the answer to this question is positive in the particular 
case k = 3 and let Au(x) = u(x) with D(A) = c3(R4), the space of 


three times continuously differentiable functions u(x), x € Ri. 


However, if A is an infinitesimal operator, then according to the 
Hille-Yosida theorem (see Dynkin 1965; Wentzell 1981) A must 
satisfy the minimum principle: if vu € D(A) is minimum at the point 
xg, then Au(xo) = 0. 


Take the function u(x) = 2(sin 2nx)* — (sin 2xx)>. Obviously u € 


c3(R!) and it is a periodic function with period |. It is easy to see 
that uw takes its minimal value at x = 0 and, moreover, u’’(0) < 0. 
This implies that the minimum principle is violated. Thus in 
general a differential operator of order k = 3 cannot be an 
infinitesimal operator. Similar arguments can be used in the case k 


> 3. 


SECTION 21. STATIONARY PROCESSES AND SOME 
RELATED TOPICS 


Let X= (X%, te TC ri) be a real-valued stochastic process defined 
on the probability space (Q, F, P). We say that X 1s strictly 


stationary if for eachn > 1 andtu, tf, +heT, k = 1,..., n, the 
random vectors 


[Xie eee and [Aes PRs cng eh ea) 


have the same distribution. 


Suppose now that_X is an L2-process (or second-order process), 
that is E[X7] < oo for each t € T. Such a process X 1s said to be 
weakly stationary if EX; = c = constant for all ¢ € 7 and the 
covariance function C(s, t) = E[(Xs — c)(X;— c)| depends on s and ¢ 
only through ¢t — s. This means that there is a function C(‘) of one 
argument ¢, ¢ € 7, such that C(t) = E[(Xs — c)(X54; — c)] for all s, s 
+tef. 

On the same lines we can define weak and strict stationarity for 
multi-dimensional processes and for complex-valued processes. 
The notions of strict and weak stationarity were introduced by 
Khintchine (1934). 

Let us note that the covariance function C of any weakly 
stationary process admits the so-called spectral representation. If T 


= r! or T = R* we have a continuoustime weakly stationary 
process and its covariance function C has the representation 


C(t) = [- ei dF () 


where F(A), A € r! is a non-decreasing, right-continuous and 
bounded function. F' 1s called a spectral d.f., while its derivative /, 
if it exists, is called a spectral density function. If T = N or T= N 
we say that X = (X,,) 1s a discrete-time weakly stationary process 


(or a weakly stationary sequence). In this case the covariance 
function C of X has the representation 


“TT 

C(n) = | e'™ dF (A) 
v= 7 

where F(A), 4 € [-z, a] possesses properties as in the continuous 

case. 

Note that many useful properties of stationary processes and 
sequences can be derived under conditions in terms of C, F and f. 
It is important to note that stationary processes themselves also 
admit spectral representations in the form of integrals of the type 


itd : 
i 00 8 AZ) With respect to processes with orthogonal 
increments. 
Let x = (X,, n € N) be a strictly stationary process. Denote by 


b 
Ma the o-field generated by the r.v.s. Xg, Xg4+1, ..., Xp. Without 
going into details here we note that in terms of probabilities of 


events belonging to the o-fields i alee and Mit+n we can define 
some important conditions, such as g-mixing strong mixing, 
regularity and absolute regularity, which are essential in studying 
stationary processes. In the examples below we give definitions of 
these conditions and analyse properties of the processes. 

A complete presentation of the theory of stationary processes 
and several related topics can be found in the books by Parzen 
(1962), Cramer and Leadbetter (1967), Rozanov (1967), Gihman 
and Skorohod (1974/1979), Ibragimov and Linnik (1971), Ash and 
Gardner (1975), Ibragimov and Rozanov (1978) and Wentzell 
(1981). 

In this section we consider only a few examples dealing with the 
stationarity property, as well as properties such as mixing and 
ergodicity. 


21.1. On the weak and the strict stationary properties of 
stochastic processes 


Since we shall be studying two classes of stationary processes, it 1s 


useful to clarify the relationship between them. 

Firstly, if XY = (X;, t € pr!) is a Strictly stationary process, and 
moreover X is an L2-process, then clearly X is also a weakly 
stationary process. However, X can be strictly stationary without 


being weakly stationary and this is the case when_X is not an L2- 
process. It 1s easy to construct examples of this. 

Further, let 9 be ar.v. with a uniform distribution on [0, 27] and 
let Z; = sin Ot. Then the random sequence (Z, = sin On, n = 1, 2, 


...) 18 weakly but not strictly stationary, while the process (Z; = sin 
Ot, t € ri) is neither weakly nor strictly stationary. If we take 
another r.v., say ¢, which has an arbitrary distribution and does not 
depend on @, then the process Y = (¥;, t € Ri) where Y; = cos(¢t + 


?) is both weakly and strictly stationary. 
Let us consider two other examples of weakly but not strictly 
stationary processes. Let ¢j and n, be r.v.s each distributed N(0,1) 


and such that the distribution of (¢], 71) 1s not bivariate normal, 
and ¢,; and 7 are uncorrelated. Such examples exist and were 


described in Section 10. Now take an infinite sequence of 
independent copies of (¢], 71), that 1s 


(ae m): (fo, n2), tas 


which in this order are renamed_X1, X9,..., that is, 


X,=&1, X2 =m, X3 = 2, X4 = Na,-.. . 
It 1s easy to check that the random sequence (X,, n = 1, 2, ...) 1s 


weakly stationary but not strictly stationary. 
Finally, it is not difficult to construct a continuous-time process 


X= (X, te Ri) with similar properties. For t> 0 take X; to be ar.v. 
distributed N(1,1) and for ¢ < 0 let X; be exponentially distributed 
with parameter |. Suppose also that for all s #¢ the r.v.s. X, and x; 


are independent. Then X is a weakly but not strictly stationary 
process. 


21.2. On the strict stationarity of a given order 


Recall that the process X¥ = (X; t¢ T © Rr!) is said to be strictly 
stationary of order m if for arbitrary ft], ..., t,, € T and t, +h, ..., 
ty +h € T the random vectors (X71, ..., Xg,) and (X714), °, Xtm+h) 


have the same distribution. Clearly, the process X 1s strictly 
stationary if it is so of any order m, m = 1. It 1s easy to see that the 
m-order strictly stationary process X is also k-order strictly 
stationary for every k, | < k < m. The following example 
determines if the converse is true. 

Let ¢ and n be independent r.v.s with the same non-degenerate 


d.f. F(x), x € r!. Define the sequence (X,, 7 = 1, 2, ...) as follows: 
=§€, X29 =§, X3 =, X4 =, XH =N, Xo =E, X7 =F, Xg = E, 
Xg=n, X19 =, X11 =§,---. 
This means that 
y _ €, if n=5k+4+1, 5442, 544+3 
ee ae iP ee eh Se. for ie S401 ccs, 
It is obvious that the sequence (X,, n = 1, 2, ...) 1s strictly 


stationary of order |. Let us check if it is strictly stationary of order 
2. E.g. the random vectors (X1, X2), (X2, X3), (X4, X5), (X6, X7), 


. are identically distributed. However, (X3, X4), that 1s, (X]+9, 
X5+47) has a distribution which differs from that of (Xj, X). 
Indeed, since Xj =¢, Xo = ¢, X3 =¢ and X4 =n we have 

PIX; < 21, X2 < Lo] = PIE < 21, € < xq] = P(E < min{z,, 29} 


= F(min{z;,29}) A F(21)F (ro) = PIE < 21,9 < xq] = P[X3 < 21, X4 < zo]. 


Therefore the sequence (X,, n = 1, 2, ...) 1s not strictly stationary 


of order 2. It is clear what conclusion can be drawn in the general 
case. 


21.3. The strong mixing property can fail if we consider a 
functional of a strictly stationary strong mixing process 


Suppose x = (X,,, n € N) is a strictly stationary process satisfying 


the strong mixing condition. This means that there is a numerical 
sequence a(n) | 0 as n — o such that 


sup |P(AB) — P(A)P(B)| < a(n) 
A,B 


rk n MOO 
where sup 1s taken over all events AceM_.,B Ee Mp4, 

This condition 1s essential for establishing limit theorems for 
sequences of weakly dependent r.v.s (see Rosenblatt 1956, 1978; 
Ibragimov 1962; Ibragimov and Linnik 1971; Billingsley 1968). 

Let g(x), X € R!, be a measurable function and é= (é,,n ¢ N)a 
strictly stationary process. Then the process (X,,, 1 € N) where X,, 
= g(¢,) 1S again strictly stationary (see e.g. Breiman 1968). 


Suppose now é = (é,, n € N) is strongly mixing and g(x), x € R™ is 


a bounded and ‘B” -measurable function. If we define the process 
X= (X,,n € N) by X, = 2(é, 41, -.-), the question to consider is 
whether the functional g preserves the strong mixing property. In 
general the answer is negative and this is shown in the next 
example. 

Let {¢;, 7 €¢ N} be a sequence of i.i.d. r-v.s. such that Ple; = 1] = 


1 = 
Plc; = 0] = 2. Define the random sequence (Aj, 7 € IN) where 
Xj = 2 ej, +2 eg, +... +2 tej t..., jeEN. 


Since {é$ consists of 1.1 d. r.v.s, then {és 1s a strictly stationary 


sequence. This implies that the sequence (X;, 7 € N) is also strictly 
stationary. However, {é;} satisfies the strong mixing condition and 
thus we could expect that (Xj, j € N) satisfies this condition. 


Suppose this conjecture is right. Then according to Ibragimov 
(1962) the sequence of d.f.s 


F(z) =P[(X.+...+Xn)/bn —Qn <2], 2¢€R’, n=1,2,... 


where a, and b, are norming constants, can converge as n — oo 


only to a stable law. (For properties of stable distributions see 
Section 9.) Moreover, if the limit law of Ff, has a parameter a, then 


necessarily Dn = (V[X1+...+ Xn])'/? = n/%h(n) where h(n) is 
a Slowly varying function. 
Consider the random sequence 


qj > lk (X;)k-3/ : 
k=1 


where rj(x), A = 1, 2, ... are the Rademacher functions: 77(X1) = 
sign sin(2‘X}) or ry, = — 1 + 2; (Ej; as above). Since rz, k = 1 are 


i 
1.1.d. r.v.s, P[r; = +1] = 2, we can easily see that 


oi 
Elgigj+i) = a k—3/4 (fk -- ie ie = 9-3/4 5-3/4 
k=1 


and 


) 


on =E|( S09} | > n° + 0()). 
\g=1 


Moreover, as a consequence of our assumption that {Aj} 1s 
strongly mixing, the sequence {g7} must satisfy the CLT, that 1s 


Pi(gi +-.-+9n)/On <2] 9 ®(z), zER as now. 


However, as the variance 7 n is greater than ! °/4(1+0(1)) it cannot 
be represented in the form nh(n) with h(n) a slowly varying 
function. This contradiction shows that the strictly stationary 
process (Xj, 7 € N) defined above does not satisfy the strong mixing 
condition. This would be interesting even if we could only 
conclude that not every strictly stationary process is strongly 
mixing. Clearly, the example considered here provides a little 
more: the functional (47) of a strictly stationary and strong mixing 


process (é;) may not preserve the strong mixing property. 


21.4. A strictly stationary process can be regular but not 
absolutely regular 


Let X = (X;, t € Ri) be a strictly stationary process. We say that_X 
is regular if the o-field 


Meza (1a ss 
teR! 
is trivial, that is if Vl—-o contains only events of probability 0 or 1. 
This condition can be expressed also in another form: for all 
Be Mx and A © ML we have 


sup |P(AB) — P(A)P(B)| 30 as too. 
A 


Further, define p(t) : = sup E [nn] where sup 1s taken over all 
r.v.s ny and np such that nj 1s Wes measurable, 7 1s Mott - 
measurable, HE = 9, Eij2 = 0,E[nj]=1, Elms] = 1. The 
quantity p(t), t = 0 1s called a maximal correlation coefficient 
between the o-fields Mo. and Ms+s, The process X is said to be 


absolutely regular (completely, strictly regular) if p(t) — 0 as t > 
0. Note that for stationary processes which are also Gaussian, the 


notion of absolute regularity coincides with the so-called strong 
mixing condition (see [bragimov and Rozanov 1978). 

It is obvious that any absolutely regular process is also regular. 
We now consider whether the converse is true. 


Suppose_X is a strictly stationary process and f(A), A € Ri, is its 
spectral density function. Then_X 1s regular iff 


T+? GA. Roe. 


(1) 


Cx) 


For the proof of this result and of many others we again refer the 
reader to the book by Ibragimov and Rozanov (1978). 
Consider now a stationary process _X whose spectral density is 


(2) FOS Gar A7 + TI) (A sim)? 


with p any positive integer. Then it 1s not difficult to check that f 
given by (2) satisfies (1). Hence_X is a regular process. However, 
the process X and its spectral density f do not satisfy another 
condition which is necessary for a process to be absolutely regular 
(Ibragimov and Rozanov 1978, Th. 6.4.3). Thus we conclude that 
the stationary process X with spectral density f given by (2) is not 
absolutely regular even though it is regular. 


21.5. Weak and strong ergodicity of stationary processes 


Let x = (X,,, n = 1) be a weakly stationary sequence with EX,, = 0, 
n= 1. We say that X is weakly ergodic (or that X satisfies the 
WLLN) if 
1 rn. | 4 | 
— Ss" Xp— 0 as noo. 
Tl 

k=1 


(1) 


If (1) holds with probability 1, we say that_X 1s strongly ergodic (or 


that_X satisfies the SLLN). 
If X = (% t = 0) is a weakly stationary (continuous-time) 
process with mean EX; = 0 and 


1 f° 
(2) af X; dt F;0 as T+ 00 
0 


then XY is said to be weakly ergodic (to obey the WLLN); X 1s 
strongly ergodic if (2) 1s satisfied with probability 1 (now_X obeys 
the SLLN). 

There are many results concerning the ergodicity of stationary 
processes. The conditions guaranteeing a certain type of ergodicity 
can be expressed in different terms. Here we discuss two examples 
involving the covariance functions and the spectral d.f. 


(1) Let x = (X,, n = 1) be a weakly stationary sequence such that 
2 


EX, = 0, E|X;,|=1 and the covariance function is C(n) = 
E[X/X;-+,]. Then the condition 


(3) lim C(n) =0 


is sufficient for the process X to be weakly ergodic (see Gihman 
and Skorohod 1974/1979; Gaposhkin 1973). Note that (3) also 
implies that (1/7) 30,1 Xx —0 which means that (3) is a 
sufficient condition for X to be L2-ergodic. Moreover, if we 
suppose additionally that XY is strictly stationary, then it can be 
shown that condition (3) implies the strong ergodicity of X. Thus 
we come to the question: if X is only weakly stationary, can 
condition (3) ensure that_X 1s strongly ergodic? It turns out that in 
general the answer is negative. It can be proved that there exists a 
weakly stationary sequence x = (X,, n = 1) such that its covariance 


function C(n) satisfies the condition 


C(n) = Of(loglogn)~*] as n—- oc 


(hence C(n) — 0) so that X is weakly ergodic but (1/m) dopa1 Xk 
diverges almost surely. Note that the construction of such a process 
as well as of a similar continuoustime process is given by 
Gaposhkin (1973). 


(ii) We now consider a weakly stationary process X = (X;, t € R’) 
with EX; = 0, EX; = 1 and covariance function C(t) = ELX,X,++! 
and discuss the conditions which ensure the strong ergodicity of X. 

Firstly let us formulate the following result (see Verbitskaya 
1966). If the covariance function C satisfies the condition 


cae | ; 
/ 7 C(H) (log t)* de rage. e 


then the process _X is strongly ergodic. Moreover, if the process _X 


is bounded, then the condition Jj © Cat 0045 si fricient tor 
the strong ergodicity of X. 

Obviously this result contains conditions which are only 
sufficient for strong ergodicity. However, it is of general interest to 
look for necessary and sufficient conditions under which a 
stationary process will be strongly ergodic. The above result and 
other results in this area lead naturally to the conjecture that 
eventually such conditions can be expressed either as restrictions 
on the covariance function C at infinity, or as restrictions on the 
corresponding spectral d.f. around 0. The following example will 
show if this conjecture 1s true. 

Consider two independent r.v.s, say ¢ and 6, where ¢ has an 


arbitrary d.f. F(x), x € RI and @ 1s uniformly distributed on [0, 27]. 
Let 


X, = V2cos(¢t+ 6), teR. 


Then the process X¥ = (X} t € R*) is both weakly and strictly 
stationary, 


EX,;=0, and C(t) -| cos(ta) dF(a). 
In particular this explicit form of the covariance function of X 
shows that the d.f. / of the r.v. ¢ is just the spectral d.f. of the 
process X. Obviously this fact is very convenient when studying 
the ergodicity of X. 
Suppose F’ satisfies only one condition, namely it 1s continuous 
at 0: 


(4) F(0) — F(0—-) =0. 


Recall that (4) is equivalent to the — condition 


limr+oo fo C(t) dt =0 which implies that X is weakly ergodic 
(see Gihman and Skorohod 1974/1979). Let us show that X 1s 
strongly ergodic. A direct calculation leads to: 


t ee | . 
(5) 7 | X, dt = | cos(¢t + @)dt=O(1/T), if ¢€40 
£ Jo 0 


7 | X,dt = V2cos6, if ¢=0. 
0) 

However, (4) implies that P[¢ = 0] = 0. From (5) we can then 
conclude that_X is strongly ergodic. 

Note especially that (4) is the only condition imposed on the 
spectral d.f. F of the process_X. This example and other arguments 
given by Verbitskaya (1966) allow us to conclude that in general 
no restrictions on the spectral d.f. at a neighbourhood of 0 
(excluding the continuity of F at 0) could be necessary conditions 


for the strong ergodicity of a stationary process. 


21.6. A measure-preserving transformation which is ergodic 
but not mixing 


Stationary processes possess many properties such as mixing and 
ergodicity which can be studied in a _ unified way as 
transformations of the probability space on which the processes are 
defined (see Ash and Gardner 1975; Rosenblatt 1979; Shiryaev 
1995). We give some definitions first and then discuss an 
interesting example. 

Let (Q, ¥, P) be a probability space and 7’ a transformation of Q 


into itself. Tis called measurable if T \(A) = {@ : To € A} € F for 
all A <¢ F. We say that 7: Q »& QO is a measure-preserving 


transformation 1f P(T™ 14) = P(A) for all A e F. If the event A « F 


is such that T-! 4 = A, A 1s called an invariant event. The class of 
all invariant events is a o-field denoted by J. If for every A € J we 
have P(A) = 0 or 1, the measure-preserving transformation T 1s 


said to be ergodic. The function 9 : (2,5) > (R',B°) is called 
invariant under 7 iff g(7@) = g() for all @. It can easily be shown 
that the measure-preserving transformation 7 1s ergodic iff every 
7-invariant function is P-a.s. constant. Finally, recall that T 1s said 
to be a mixing transformation on (Q, F, P) iff for all A, Be, 


lim P(ANT~"B) = P(A)P(B). 


We now compare two of the notions introduced above—ergodicity 
and mixing. 

Let T be a measure-preserving transformation on the probability 
space (Q, F, P). Then: (a) 71s ergodic iff any 7-invariant function 
is P-a.s. constant; (b) 7 mixing implies that T is ergodic (see Ash 
and Gardner 1975; Rosenblatt 1979; Shiryaev 1995). 


Let 2 = 10,1], F = Bioy and P be the Lebesgue measure. 


Consider the transformation Taw = (@ + 4)(mod 1), w € Q. It is easy 
to see that 7'1s measure preserving. Thus we want to establish if 7 
is ergodic and mixing. 

Suppose first that A is a rational number, A = k/m for some 
integer k and m. Then the set 


1 
is invariant and P(A) = 2. This implies that for A rational, the 
transformation 7 cannot be ergodic. 

Now let A be an irrational number. Our goal is to show that in 


this case T is ergodic. Consider a r.v. € = ¢(@) on (Q, F, P) with 
E[2] < oo. Then (see Ash 1972; Kolmogorov and Fomin 1970) the 
Fourier series 2un=—coCn®"""” of the function E(w) is L4- 


Ow 2 - ; 
convergent and Yes Enl SOO: Suppose that ¢ is an invariant 
rv. Since 7 is measure preserving we find for the Fourier 
coefficient c,, that 


E|é(w)e7 27?" a Elé(Paje "4 =| 


Ch 
= a 2b e( Tie 2 
= er ane ae oe = at ia 
This implies that c,(1 — e—2mind) = 0. However, as we have 
assumed that A 1s irrational, e—2mind #1 for alln #0. Hence c, = 0 


ifn #0 and ¢(@) = co a.s., cg = constant. From statement (a) we 


can conclude that the transformation 7 is ergodic. 

It remains for us to show that T is not mixing. Indeed, take the 
set A = {@ : 0 < @m < 1/2} and let B = A. Since T is measure 
preserving and invertible, then for any n we get 


P(ANT~"B) = P(ANT-"A) = P(T"AN A). 


Let us fix a number é é€ (0, 1). Since A 1s irrational, then for 


2nink iQ — 


infinitely many n the difference between e— and e 1, in 


absolute value, does not exceed ¢. The sets A and 7” A overlap 
except for a set of measure less than e. Thus 

P(ANT "B) > P(A) —-e 
and for 0<e<8 we find 

Pane" B)S =, 
If the transformation 7 were mixing, then 
P(ANT "B)—>P(A)P(B) asn-co 

and it should be P(A)P(B) = 5. On the other hand, since P(A) = 3, 

P(A)P(B) = [P(A)]’ = 3. 


Thus we come to a contradiction, so the mixing property of 7’ fails 
to hold. Therefore, for measure-preserving transformations, mixing 
is a stronger property than ergodicity. 


21.7. On the convergence of sums of g-mixing random 
variables 


It is well known that in the case of independent r.v.s {X,, n = 1} 


the infinite series 22n-1*n is convergent simultaneously in 
distribution, in probability and with probability 1. This statement, 
called the Lévy equivalence theorem, can be found in a book by Ito 
(1984) and leads to the question: does a similar result hold for 
sequences of ‘weakly’ dependent r.v.s? 

Let {X,, n = 1} be a stationary random sequence satisfying the 


so-called g-mixing condition. This means that for some numerical 


sequence y(n) | 0 asn — © we have 
IP inp — P(A)P(B)| < g(n)P(A) 


where 4 € My, BE Minin, m 2 1,7 > Land P(A) > 0. 
Note that there are several results concerning the convergence of 
the partial sums S, = X] + ... + X, aS n — co of a g-mixing 


S 4 
eed 


rT ne? 


sequence (see Stout 1974a, b). Let us formulate the following 
result from Stout (1974b). 

The conditions (a) and (b) below are equivalent: 

x d 
(a) din=l Xn converges in distribution and Xn —> 0 as n — «; 
(b) 2n=1 *n converges in probability. 
, d 

Recall that for independent r.v.s a condition like Xn —~ 0 is not 

involved. Since for g-mixing sequences conditions (a) and (b) are 
d 

equivalent it is clear that removal of the condition *n —> 9 will 


surely make the series yn=1 Xn not convergent in probability. An 
illustration follows. 
Consider the sequence {En, n > 1} of 1.1.d. r.v.s with P[¢] = +1] 


=3 and let X, = ¢)+1 — Gy. It 1s easy to see that the new sequence 
n> 1} is g-mixing with g(n) = 0 for all n => 2. Clearly 

Sn = Dipai Xr =nt+1—&1- Tt follows from here that S;, 1s 
convergent in distribution because S,, has the same distribution for 
all n, namely S', takes the values 2, 0 and —2 with probabilities 4, 3 


il 
and 4 respectively. Obviously {S,,} is not convergent in probability 


aS Nl — %, 


21.8. The central limit theorem for stationary random 
sequences 


The classical CLT deals with independent r.v.s (see Section 17). 
Thus if we suppose that {X,, n => 1} is a sequence of ‘weakly’ 


dependent r.v.s we cannot expect that without additional 
assumptions the normed sums S,,/s, will converge to the standard 
normal distribution. As usual, Sn = X1 +---+ Xn. 8; = WSn.. 
There are works (see Rosenblatt 1956; [bragimov and Linnik 1971; 
Davydov 1973; Bradley 1980) where under appropriate conditions 
the CLT is proved for some classes of stationary sequences (see 
Ibragimov and Linnik 1971; Bradley 1980) and for stationary 
random fields (see Bulinskii 1988). 

We present below a few examples which show that for 
stationary sequences the normed sums S,,/s, can behave differently 
as n — oo, In particular, the limit distribution, if it exists, need not 
be the normal distribution N(0,1). 


(i) Let € be a r.v. distributed uniformly on [0, 1]. Consider the 
random sequence {X,, n = 0, £1, ...} where X, = cos(2znc¢). It is 
easy to see that the variables xX, are uncorrelated (but not 
independent), so {X,,} forms a weakly stationary sequence. If S, = 


| 
X, + ... + X, we can easily see that ES, = 0 and VS, = 2n. 
Moreover, 
1 1sin(27(n+ 
a sin(7€) 


According to a result by Grenander and Rosenblatt (1957), we 
have 


d 1 1 sin(an) 
-_— - SS eC 6 ee 
2 22sin(7é) 


where 4 1s another r.v. uniformly distributed on [0, 1] and 
independent of ¢. Note especially that S,, itself, not the normed 


quantity S,,/s,, has a limit distribution. Moreover, it is obvious that 
S;,/S, does not converge to ar.v. distributed N(0,1). 


(ii) Consider the sequence of r.v.s {X,, n = 0, £1, ...} such that for 


an arbitrary integer n and non-negative integer m, the random 
vector (X,, Xy)+], ---» Xyn+m) has the following density: 


Tht 
f(a as ak ae )= ap ye —n =! —2 Pee | 
P(@nsUn415+++)Lntm) = 5 (2m a" exp | —5 0] ee 
k=0 | 
Lov 
*9,\—-n/2_—n .. a Ger 
+ 5 (2m) a2" exp | —599 ; Eas 
k=0 


Here 0] > 0, 07 > 0 and we assume that 0] # 02. Obviously {X,,} is 
a strictly stationary sequence. If S, = X] + ... + Xj, it is not 
difficult to see that 

lim P[S,,/s, < 2] := G(x) = $®(o12) + $®(o22). 

noo = 
Thus the limit distribution G of S,/s, is a mixture of two normal 
distributions and, since o] # 07, G is not normal. 


(iii) Let {X,, n = 0, +1, ...} be a strictly stationary sequence with 
E[X;] < © for all n. Denote by p(n), n = 1, the maximal 


correlation coefficient associated with this sequence (see Example 
21.4). Recall that in general 


p(n) = sup {E[(7, — En,)(n2 — Ene)]/(Vni Vn2)!/7} 
1 "2 
where 77] 1S M50 -measurable, 2 1S Mmetn -measurable, 0 < V7}, 
V2 <0, mis any integer and n = 1. Note that the condition 


p(n) +0 as n->0co 


plays an important role in the theory of stationary processes. In 
particular, the g- mixing condition implies that p(n) — O and, 
further, p(n) — 0 implies the strong mixing condition (for details 
see Ibragimov and Linnik 1971). 


Suppose {X,, 1 = 0, +l, ...} is a strictly stationary sequence 
with EX, = 0 and E[X;,] < © for all n. Using the notation S,, =X 


+ ... + X, and s, = E[S;] we formulate the following result of 
Ibragimov (1975). If p(n) — #O then ~ either 
sup, 8, < coors; =nh(n) where h(n) 1s a Slowly varying 


function as n — oo. If 8, — 00, p(n) > 0 and for some 6 > 0, 
d 
E[|X0|7*?] < oo, then Sn//n—>Y as n — o for a rv. Y 
distributed (0,1). 
Our aim now 1s to see whether the conditions for this result can 


be weakened while preserving the asymptotic normality of Sn/ Vn. 
In particular, an example will be described in which instead of the 


condition E[|Xq|2"?] < oo we have E[|Xq|7"?] = o0 for each 0 > 0 


but E[|Xo|7] < oo, This example is the main result in a paper by 
Bradley (1980) and 1s formulated as follows. 

There exists a strictly stationary sequence {X,,”n = 0, +1, ...} of 
real-valued r.v.s such that: (a) EX, = 0 and 0 < VX, < o; (b) 
S, > Wasn 00; (c) p(n) > 0 as n — o; (d) for each A > 0 
there is an increasing sequence of positive integers {n(k)} such that 
2 SK) /8n(k) *, £ as k > 00 


defined by 


where € is ar.v. with a df. Fy 


| “ 2 ig , 
F(x) = e* 119,00) (x) + me e“/** du, ce’. 
3 Vf 27 Be 
Note that for each fixed i > 0 the limit distribution /, is a Poisson 
mixture of normal distributions and has a point-mass at 0. Thus F 


is not a normal distribution. Therefore the stationary sequence 
constructed above does not satisfy the CLT. 
It is interesting to note that /, 1s an infinitely divisible but not a 


stable distribution. (Two other distributions with analogous 


properties were given in Example 9.7.) 

Let us note finally that Herrndorf (1984) constructed an example 
of a stationary sequence (not m-dependent) of mutually 
uncorrelated r.v.s such that the strong mixing coefficient tends to 
zero “very fast’ but nevertheless the CLT fails to hold. For more 
recent results on this topic see the papers by Janson (1988) and 
Bradley (1989). 


SECTION 22. DISCRETE-TIME MARTINGALES 


Let (X,,, 1 = 1) be a random sequence defined on the probability 
space (Q, ¥, P). We are also given the family (Fn, n > 1) of non- 
decreasing sub-o-fields of F, that is Fn C F for each n and 
Fn C Fn+1- As usual, if we write (X,, Fn, n> 1), this means that 
the sequence (X,,) is (7n)-adapted: X,, is ¢n-measurable for each n. 
The sequence (X,, 1 = 1) is integrable if E|X,,| < oo for every n> 1. 


If supn>1, E|X,| < 0 we say that the given sequence is LI - 
bounded, while if E[supy> 1 |Az,|] < 0 the sequence (X;),, n = 1) is 
L! -dominated. | 

The system (X,,, Fn, n> 1) 1s called a martingale if ELX,| < ©, n 
> 1 and 


(1) E[Xn|Fm] = Xm as. 


for all m <n. If in (1) instead of equality we have E[X,|Fm] <X,, 
or ELX,|%m] > X,,, then we have a supermartingale or a 
submartingale respectively. 

A stopping time with respect to (Jn) is a function tr: QY4 NU 
foot such that [t =n] € Jn for all n > 1. Denote by T the set of all 
bounded stopping times. Recall that the family (a,, t € T) of real 
numbers (such a family is called a nef) is said to converge to the 
real number 6 if for every ¢ > 0 there 1s tg € T such that for all t € 


T with t > 19 we have ja, — b| <e. 

Some definitions of systems whose properties are close to those 
of martingales but are in some sense generalizations of them are 
listed below. The random sequence (X,,, 7n, n= 1) is said to be: 


(a) a quasimartingale if ns E|| Xn — E(Xn41|Fn)|| < 00; 
(b) an amart if the net (ELX,], c ¢ T) converges; 


(c) a martingale in the limit if 
SUPm>n |E(Xm|Fn} — Xn! —=+0 as nm — 00: 
(d) a game fairer with time if 


SUP m>n EX | Fn) — Xa “s0asn—> oO; 
(e) a progressive martingale if A, © A,+, for n = 1 and 
PIUR=1 An] = | where An = [E(Xn41|Fn) = Xn]; 


(f) an eventual martingale if PIE(Xn+1|Fn) # Xn i.o.| = 0. 


Random sequences which possess’ the martingale, 
supermartingale or submartingale properties are of classic 
importance in the theory of stochastic processes. Complete 
presentations of them have been given by Doob (1953), Neveu 
(1975) and Chow and Teicher (1978). 

The martingale generalizations (a)-(f) given above have 
appeared in recent years. Many results and references in this new 
area can be found in the works of Gut and Schmidt (1983) and 
Tomkins (1984a, b). 

In this section we have included examples which illustrate the 
basic properties of martingales and martingale-like sequences 
(with discrete time) and reveal the relationships between them. 


22.1. Martingales which are L!-bounded but not L!-dominated 
Let x = (X;,, Fn, n> 1) be a martingale. The relation sup,>, E|X;| < 
E[sup,>1 |X;,|] implies that every L!-dominated martingale is 


alsob!-bounded. This raises the question of whether the converse 
is true. The answer is negative and will be illustrated by a few 
examples. 


(i) Consider the discrete space Q = {1,2,...} with probability P on 
l 


it defined by P({r}) = — aa" EN. Let Gn, n > 1) be the 
increasing sequence of o-fields where Jn is generated by the 
partitions {{1}, {2}, ..., {na}, [n+ 1, 0)}. Define the sequence (X,,, 
n= 1) ofr.v.s by 


Xn = Xn(w) = (n+ 1) X lagijcsyw), 2 EN. 
Then x = (X,,, Jn, n > 1) is a positive martingale such that EX, = 1 
for all n € N and hence X is L!-bounded. However, SUuPnen Xn(@) = 
wm and clearly it is not integrable. Therefore the martingale_X 1s not 
L-dominated. 
(ii) Let OQ = [0,1], F = B, 
Define 


0.1) and P be the Lebesgue measure. 


0, Ly nice 
An = Ay = ‘ anny 4+, if Ve Ww < L/n 
and Fn = o{X},..., X,}. Then (X;, Fn, n = 1) is a martingale. Since 
EX, = 3 for each n €N this martingale is L!-bounded. However, its 
supremum, Supney |X|, 18 not integrable and the L!-domination 
property fails to hold. 
(iii) Let w = (w(d), t > 0) be a standard Wiener process, Ft = o{wg, s 
<t}. Take any numerical sequence {n;z, k => 1} such that 0 < nj < 
ny<:—> 0 as k — o. Denote My = exp[w(nk)— 37e]- Then it can 
be shown that @ = (Mz. Fn,.,k 2 lis a non-negative martingale 
(and even that \/, 0 as k > 00) which is integrable but 


E[sup,>1 Mj] = ©. Hence in this case the L!-domination property 
again does not hold, despite the integrability of /. One additional 


example of an L bounded but not L!-dominated martingale will 
be given at the end of Example 22.2. 


22.2. A property of a martingale which is not preserved under 
random stopping 


Let X¥ = (X,, Jn, n > 1) be a_ martingale and 


re | 
Yn = 7 (X1 + +++ + Xn)- Denote by T the set of all bounded (Fn)- 
stopping times and introduce the following four 


(1) sup E|X,,| < 00, 
n=l 

(2) sup | ¥;,|< 00, 
n> 1 

(3) sup E|.X,| < 00, 
TET 

(4) sup ElY,| < oo. 
TET 


Obviously, conditions (3) and (4) can be considered as ‘random 
stopped versions’ of (1) and (2) respectively. It 1s well known (see 
Yamazaki 1972) that conditions (1) and (2) are equivalent; 
moreover, conditions (1) and (3) are also equivalent. Thus it 1s 
natural to assume that (3) and (4) are equivalent. However, as we 
shall now see, this conjecture 1s wrong. 

Let t € T, that is t is a positive integer-valued r.v. such that P[z < 
00] = 1 and let P[z >] >0 for every n> 1. Denote by Jn the o-field 
generated by the events [zt = 1], [c= 2], ..., [7 =n]. Clearly, 7 1s an ( 
Jn)-stopping time. Let {b,,n > 1} be a non-increasing sequence of 
positive numbers such that b;—-1 — bj; = 0 for those & for which P[t 


= k| = 0, and in such cases we also put (b;-1 — 6;)/P[t = k] = 0. 
Define the sequence (X,, 1 = 1) of r.v.s by 


Th 


(5) Xn(w) =) [(dp—1 — by) /Plr = kl} praiy(w) + On/Plr > n)) pony w). 
Kad 
Then it is not difficult to check that Y = (X,,, Jn, n > 1) is a non- 


negative martingale. Indeed, taking into account that [t = 1], ..., [7 
=n-—1]and[c>n- 1] are atoms of Jn, we can easily see that 


/ (Xn _ Ai 9) daP=—V, 6= Lise tt 1, 
[r=] 


/ (X, — Xn-1) dP = (bn-1 — bn) + (bn — On-1) = 0. 
[rT>n-]] 


These relations imply the martingale property of XY. We can check 
directly that condition (1) is satisfied and hence (2) and (3) hold. It 
then remains for us to clarify whether condition (4) 1s satisfied. To 
do this, consider the following variable Y, = (1/t)(X] + ... + X>). 


Clearly, 


T-—1 
Y, > (1/1) 7X =(1/r) >} (Ge/Plr> Ks 
k=1 A=! 


n—] 
Here 7 is a r.v. which takes the value (1/7) dJ,=1 (be/Pir > Fi) 
with probability equal to P[t = n]. This implies that EY, > i So 


our aim is to estimate the expectation Ey. However, we need the 
following result from analysis (the proof 1s left to the reader): 1f 
{ay, n = 1} 1s a positive non-increasing sequence converging to 
zero and {b,, n = 1} is a non-negative and non-increasing 
sequence, then 


6 = (an-1 ~ an) Y( (b; /a;) > EI irae ctpa Wao Bal 


n=? y= 1 4 n=l 


Now let a, = P[t > n], and take the sequence {b,, => 1} used to 


define X by (5) to be non-increasing and bounded from below by 
some positive constant, that is b, => c = constant > 0 for all n > 1. 


Then these two sequences, {a,, 1 => 1} and {b,, n= 1}, satisfy the 


conditions required for the validity of (6). After some calculations 
we find that Ey = co and hence 


E|/Y,| = EY, > En = co. 


Therefore condition (4) does not hold in spite of the fact that (1), 
(2) and (3) are satisfied. 

Finally, let us look at the following possibility. It is easy to see 
that the martingale (X,, Jn, n > 1) defined by (5) is uniformly 
integrable. If in particular we choose 5, = I/(n + 1) and P[t =n] = 


2", then we can check that E[sup,>, X;,,] = ©. Thus we obtain 


another example of a martingale which 1s L!-bounded but not L!- 
dominated (see also Example 22.1). 


22.3. Martingales for which the Doob optional theorem fails to 
hold 

Let x = (X,, Jn, n = 0) be a martingale and t be an (Fn)-stopping 

time. Suppose the following two conditions are satisfied: 


Th Oo 


(a) E||.X,|| < 00; (b) lim | Xe dP 1, 
[T>n| 


Then EX, = EXo. 
This statement, called the Doob optional theorem, is considered 
in many books (see Doob 1953; Kemeny et al 1966; Neveu 1975). 


Conditions (a) and (b) together are sufficient conditions for the 
validity of the relation EX, = EXo. Our purpose now is to clarify 


whether both (a) and (b) are necessary. 


(i) Let {4,, n = 1} be a sequence of 1.1.d. r.v.s. Suppose 71 takes 
only the values —1, 0, 1 and Ey, = 0. Define X, =] +... +7, and 
Jn =o{N], ..-. Nyt forn >= 1 and Xo = 0, Fo = {9, QF., Clearly X 
= (X,, Jn, n> 0) is a martingale. If t= inf{n : X, = 1}, then t is an 
(Fn)-stopping time such that P[z < oo] = 1 and _X, = 1 a.s. Hence 
EXo = 0 # 1 = EX, which means that the Doob optional theorem 


does not hold for the martingale _X and the stopping time t. Let us 
check whether conditions (a) and (b) are satisfied. 
It is easy to see that ELX,| < oo and thus condition (a) 1s satisfied. 


Furthermore, 


ie | AzdP = / X,, dP + | Ao b ta) oe 
() [r<n] Jir>n| 


The term J; 1s equal to the probability that level 1 has been 


reached by the martingale YX in time n and this probability tends to 
l asn — oc, Since J] + Jy = 0 we see that J/> tends to —1, not to 0. 


Thus condition (b) 1s violated. 


(ii) Let ¢1, ¢o, ... be independent r.v.s where ¢, ~ N(0, 5,). Here 
the variances b,, n = 1, are chosen as follows. We take 5; = | and 
bnt1 = 4n41 — 4n for n> 1 where a, = (n — 1)2/ log(3 +n). The 
reason for this special choice will become clear later. 

Define X, =&1 +... + 6, and Jn =o {€, ..., &}. Then X = (X,, 


Fn, n > 0) is a martingale. Let g be a measurable function from R! 


to N with P[g(¢,) =n] =p, where p, = n 2 — (n + 1) 2,0 > 1. Thus 
T = g(¢]) 1S a Stopping time and moreover its expectation is finite. 
It can be shown that the relation EX, = EX] does not hold. So let 


us check whether conditions (a) and (b) are satisfied. 
Denote by F the d.f. of ¢) and let S$} = 0, S, =¢) +... + ¢, forn 


> 2. Thus ¢] is independent of $1, So, ... and _X, = ¢, + S, where 
Sn ~ N(0,a;,).. Now we have to compute the quantities E|X,| and 


J [r>n |Xn| dP. We find 


f  \xnlaP= f Bly + SulldF) 
J ilr>n| J [g>n| 


A 


< | Alvi + BISa1} dP) 
g>n 


/ yl dF(y) + canPlg > nl 
J [g>n] 


where c = E[|¢]]. It is easy to conclude that J [r>n] Xn] dP — 0 as 


n — co and hence condition (b) is satisfied. Furthermore, 


E|X, 


/ Elly + Soy)l] dF (y) > / E[|Soryy|] dF(y) 


ee 
C | Ag(y) IF (y) = ¢ », Dn An = OO 


11 


and condition (a) is not satisfied. 
Examples (1) and (11) show that both conditions (a) and (b) are 
essential for the validity of the Doob optional theorem. 


22.4. Every quasimartingale is an amart, but not conversely 


It is easy to show that every quasimartingale is also an amart (for 
details see Edgar and Sucheston 1976a; Gut and Schmidt 1983). 
However, the converse 1s not always true. This will be illustrated 
by two simple examples. 


(i) Let a, = (-1)"n71, n> 1. Take X, = a, a.s. and choose an 


arbitrary sequence {t,, n > 1} of bounded stopping times with the 

only condition that t T 0 as n — oo. Since a, — 0 as n — ©, we 
a.8. 

have 47, —* 9 as n — o. Moreover, la,| < 1 implies that EX7, —- 


0 as n — 0. Hence for any increasing reg * of o-fields (Fn, n > 1) 
to which (t,,) are related, the system (X,, Jn, n = I) is an amart. 


However, aa ,E|Xn — E(Xn+1|Fn)| = se 1 |@n — Gn—1| = 00 
and the amart (X,,, +n, n > 1) is not a quasimartingale. 

(ii) Let (X),, n= 1) be a sequence of 1.1.d. r.v.s such that PLX, = 1] 
= P(X, —=1|= 2 and let (c,, n = 1) be positive real numbers, c,, | 


0. as nc and Xun=i Cn = ©-, Consider the gs set (Y,,n = 1) 
where Y,, = c)X 1 ... X, and the o-fields tn = o{X], ..., Xp}. 
Clearly, Y, is ‘n-measurable for every n > 1. Since as. 
Yn| <n | 0, Y7,, 0 as n — oo for any sequence of bounded 
stopping times (t,, 1 => 1) such that t, Tt 00 as n — oo. Applying the 
dominated convergence theorem, we conclude that EY,,, — 0 asn 
— 0, so Y=(Y,, Fn, n> 1) is an amart. However, 


CxO 


oo ea) 
Ss" E|Y;, oa E(Yn+1|Fn)| = S- ElY;,| = S- Cy = OO 


n=] mn! n=l] 


and therefore the amart Y is not a quasimartingale. 


22.5. Amarts, martingales in the limit, eventual martingales 
and relationships between them 


(i) Let ¢y, co, ... be a sequence of positive 1.1.d. r.v.s such that Ecy 


< o and E[é, log™ €,] = «. Consider the sequence X1, X, ... 
where X, = ¢,/n and the ay (Fn, n > 1) with Fn = of, ... 
é,}. It is easy to check that Xn —> 0 as n > o. Moreover, EX, > 
0 asn — and E[sup,> 1 X;,] = ©. It follows that X= (X;,, Fn, n> 


1) is a martingale in the limit, but_XY 1s not an amart because the net 
(EX,, t € T) is unbounded where T is the set of all bounded (Fn)- 
stopping times. 

(ii) Consider the sequence (7,, n = 1) of independent r.v.s, P[7,, = 
lj=n 2=1-P[y, =O]. Let Fn = o{n1, ..., ny} and X, =n] +... 
+ Hy. Since 


E(Xn|5m)—Xm= Sk? > lim (E(X,|Fn)—Xp)=0 as. 
| n>m—oo 
k=m+1 = 
we conclude that x = (X,, tn, n > 1) is a martingale in the limit. 


Moreover, 


E bs n = ee <oo and |X,|< Ss" Ine|, n> 
k=1 k=1 k=1 


imply that_X is even uniformly integrable. Despite these properties, 
X 1s not an eventual martingale. This follows from the relation 


E(X,|Fn-1) = Xn-1 +077 4 Xy-1 foralln > 2 


and definition (f) (see the introductory notes in this section). 


22.6. Relationships between amarts, progressive martingales 
and quasimartingales 

(i) Let (¢,, n = 1) be independent r.v.s such that P[¢, = 1] =nA(n 
+ 1)=1-— Plc, = 0], n = 1. Define ny; = 1 and for n > 2, 4, = 
(-1)" 1é,é ... &)-1. Further, let X,, =, +... ty, and Fn = 6 {€], 
...5 Gy}. Obviously, for every n, X,, 1s either 0 or 1. Moreover, by 
the Borel-Cantelli lemma, P[¢, = 0 1.0.]| = 1 which implies that 
P[y,, £ 0 i.o.] = 0. However, E[7,|Fn_1] =7,)+1 as. and 7,41 = 0 if 


ny = 0. Hence x = (X;,, +n, n => 1) is a progressive martingale. Let 
us check if_X is a quasimartingale. We have 


Ey, = (-1)"~° Nes oy" 


and 


S > EJE(?Fn—1) 1) Petals 


ie 
Therefore the progressive martingale_X is not a quasimartingale. 
(ii) Let us now describe a random sequence which is a progressive 


martingale but not an amart. 
Consider the sequence (¢,, n = 1) of independent r.v.s where 


P[¢, = 1] =n/(n + 1) =1— P[¢,, = 0] (case (4) above). Let Xp = 1 
and forn > 1, X, =n7é]2 ... €)-1, and Fn = of, ..., Gt, n> 1. 
Clearly, 


E[Xn|Fn-1] = Ma = Gra aprentXn- l dS. 


By the Borel-Cantelli lemma P[c,, = 0 1.0.] = 1 and since X,-1 = 0 
implies that X, = 0, we conclude that PLX, # 0 1.0.] = 0. 
Consequently x = (X,, Jn, n > 1) is a progressive martingale. 
However, EX, = n—o as n—0o which shows that _X cannot be an 
amart. 
(iii) Recall that every quasimartingale is also an amart and a 
martingale in the limit. Let us illustrate that the converse 1s false. 
Consider the sequence (X, nn = 1) given’ by 
Xn = Vga (-1)8 tk and let Fn = Fo for all n > 1. Then x = 
(X,, Jn, n > 1) is an amart and also a martingale in the limit. 


Further, we have 0O< Xn < Lee yh oe aue 67 
However, 


S > EIE( os 1) = 3 ~ = 00 — 


n=] 


and therefore _X is not a quasimartingale. 


22.7. An eventual martingale need not be a game fairer with 
time 
Let (&,, n = 1) be independent r.v.s such that P[é, =—-1]=2 "=1 


= Pio; =1),7 & | Let = Ol Cy ony 1 = = 2 oe 
I(é, =—1) for n= 1 and X, =, +... + y,,n = 1. Then for k > 1 
we find 


E[ne|Fr—1] = 2° *2(En-1 = —1)E(Ex|Fn-1) 
= (2"~* — 1)I(,-1 = —1). 


Hence 


y=? PIE( Xn|Fn— 1) # Xn- (jae 2 Pié,— ce eee pet camer ~0 8) 
Therefore x = (X,,, Fn, n > 1) is an eventual martingale. 


Now take m > 2. Then 
E( Xan = Aun ley) = rine i + Ble l ) ea | 
= ae ae ~ 1)P[&,- he =H ‘5 (2™ ~ L)I (Em = 1) 
> Seo i> re Aa _ / aia a > >: 


AL. 
Hence if 0 < ¢ <2 we obtain 


PI|E(Xom|Fm) —Xm| >e]=1 forall m > 2. 


This means that_X 1s not a game fairer with time. 


22.8. Not every martingale-like sequence admits a Riesz 
decomposition 

Recall that the random sequence (X;,,, n,n > 1) is said to admit the 

Riesz decomposition if X, = M, + Z,, n> 1, where (M,, Fn, n > 1) 

is a martingale and E[Z,/,| — © as n — oo for every 

A € UR 1Fn- If this property holds then the sequence (EX,,) must 

converge since 


EX, = EM, + EZ, = EM, +E|Z,lQ| ~> EM, as n- oo. 


There are of course martingale-like sequences which admit the 
Riesz decomposition. However, this property does not always 
hold. 

Consider the sequence (¢,, 1 = 1) of 1.1.d. r.v.s such that P[¢] = 


1 
A] = P[é, = 0] = 2. Let X, = €]@ ... &, and Fn = of 4], ...,6,), n= 
1. Since EX, = 2” > o, (X,, Jn, n > 1) does not admit a Riesz 
decomposition. It remains for us to show that (X,, Jn, n > 1) is a 


martingale-like sequence in the sense of at least one of the 
definitions (a)-(f) given in the introductory notes. In particular, it 1s 
easy to see that ELX,,41|%n] = 2X,. Also we have X,,+41 = 0 if X, = 


0. By the Borel-Cantelli lemma we conclude that PLX,, # 0 1.0.] = 0 
and therefore (X,,, +n, n > 1) is a progressive martingale. 


22.9. On the validity of two inequalities for martingales 


Here we shall consider two important inequalities for martingales 
and analyse the conditions under which they hold or fail to hold. 


(i) Let (X,, Jn, n => 1) be a martingale and g : ri +r! bea 


measurable function which is: (a) positive over R*; (b) even; and 


(c) convex; that is, for any x, yp «€ Ri, 
g(3(« + y)) < 39(©) + 59(Y)- Then for an arbitrary ¢ > 0, 


(1) P| wy IX & | < Elg(Xn)]/9(e). 


O<k<in 


Note that this extension of the classical Kolmogorov inequality was 
obtained by Zolotarev (1961). Now we should like to show that the 
convexity of oO 1s essential for the validity of (1). 

Suppose g satisfies conditions (a) and (b) but not (c). Since g is 
not convex in this case, there exist a and h, 0 <h <a, such that 


(2) g(a) > $g(a—h) + Sg(a +h). 


Consider the r.v.s X} = ¢) and X> = ¢) + 9 where ¢] and ¢ are 
1 
independent, ¢] takes the values +a with probability 2 each, and ¢5 


takes the values +h also with probability 3 each. It 1s easy to check 
that EL¥>|X]] =X] a.s. Thus letting F1 = 714 f- F2 = o1h.E4 
we find that the system (X;, 7, 4 = 1,2) 1s a martingale. Since g is 
an even function, taking (2) into account we obtain 


E[g(X2)] = zlg(-a—h) + 9(-a+h) + 9(a—h) + 9(a+h)| 
slg(a — h) + g(a +h)] < g(a) 


P | sup |X%| 2 | = 1>Elg(X2)|/g(a). 


Lik 2 


Therefore inequality (1) does not hold for the martingale 
constructed above, taking ¢ =a. 


(ii) Let X¥ = (X,, Jn, n > 1) be a martingale and 


r _ n ae: 
X]n = Lija1(AXj)°, , AX; = Xj — Xj], XQ = O be its quadratic 
variation. Then for every p > | there are universal constants 4, 


and B, (independent of X) such that 


(3) Al |e/ 1X (valle [|Reelln SB ll4/ Xa lle 


where||Xn\lp = (EXP). 

Note that inequalities (3), called Burkholder inequalities, are 
often used in the theory of martingales (for details see Burkholder 
and Gundy 1970; Shiryaev 1995). We shall now check that the 
condition on p, namely p = 1, is essential. By a simple example we 
can illustrate that (3) fails to hold if p = 1. 

Let ¢], 9, ... be independent Bernoulli r.v.s with P[¢; = 1] = 


1 -_ TLAT 
Pil¢; = lI] = 2 and iet Xn = Lij=1 8) where 
r= infin 2 1: Did = UeteF, = off, ..., &} then it 
is easy to see that the sequence x = (X,,, 7n, n > 1) is a martingale 


with the property 


|Xnll: = E|X,| = 2E[X7] > 2 as n- oo. 


However, 
be 1/2 

| V IX |nll1 =E( V [X |n) =H < > | =E{ yr /\ n} —+ OO aS 7h CO. 
j=1 | 


Therefore in general inequalities (3) cannot hold for p = 1. 


22.10. On the convergence of submartingales almost surely and 
in L! -sense 


Let (X,,, Fn, n> 1) be a submartingale satisfying the condition 


(1) sup E|.X,| < oo. 

n>] 
Then, according to the classical Doob theorem, the limit X, := 
lim, 500 Xp exists a.s. and E|Xoo| < 0. Moreover, if (X,,, %n, n => 1) 


is a uniformly integrable submartingale, then there is ar.v. X,, with 


4.8. <r | 
E|X,,| < 00 such that Xn + Xx and Xn —> Xoo as n — w. The 


proof of these and of many other close results can be found in the 
books by Doob (1953), Neveu (1975), Chow and Teicher (1978) 
and Shiryaev (1995). 

Let us now consider a few examples with the aim of illustrating 
the importance of the conditions under which the above results 
hold. 


i 
2 


(1) Let {¢,, n = 1} be 1.1.d. r.v.s with P[¢) = 0] = P[¢y = 2] = 2. 
Define X,, = €] ... &) and Jn = of 4], ....é)}, n = 1. Then (X,, Fn, n 
> 1) 1s a martingale with EX, = 1 for all n > 1. Hence condition (1) 


implies that Xn + Xoo as n — 0 where Xy is ar.v. with E|Xoo| < 


oo, Clearly we have PLX,, = 2”] =2°", PLX, =0]=1-2” = ee 


= 0 a.s. However, E|X, — X,| = EX,, = 1. Therefore “n wae Xoo 
despite the a.s. convergence of X;, to Xo. 


(11) Let ©, F, P) be a _ probability space defined by 
Y= [0,1),F = Boo, -1] and let P be the Lebesgue measure. On this 
space we consider the random sequence (X,, n = 1) where X, = 
X,(@) = 2" if w € [0, 2 "] and_X, = X,(@) = 0 if wm € (2 ”, 1] and 
let Fn = o{X],:..., X,}- Then (X,, Fn, n > 1) is a martingale with 
EX, = 1 for all n= i a by (1), Xn + Xoo asin —> w with Xx 


=(). Again, as above, *n we Vas n — oo. 
So, having examples (1) and (11), we conclude that the Doob 


condition (1) guarantees a.s. convergence but not convergence in 
the L!-sense. In both cases we have E|X,| = 1, 1 = 1, which means 
that the corresponding martingales are not uniformly integrable. 


(iii) Let us consider this further. Recall that the martingale x = CX, 


Jn, n> 1) is said to be regular if there exists an integrable r.v. é 
such that X, = E[¢4n] a.s. for each n > 1. Clearly, if the parameter 
n takes only a finite number of values, say n = 1, ..., NV, then such a 
martingale is regular since X,, = E[Xy|Fn]. However, if n € N, the 


martingale need not be regular. 

Note first the following result (see Shiryaev 1995): the 
martingale X is regular iff X is uniformly integrable. In this case_X,, 
= E[X,,|\7n] where X,5 =: lim, 505 Xp. 

Consider the sequence (€;, k > 1) of 1.1.d. r.v.s each distributed N 
(0,1) and let 
Sc Cr orn oe My SSK Sy Bi), SS OE; acetyl 
. Then we can easily check that _X = (X,, tn, n > 1) is a martingale. 
Applying the SLLN to the sequence (¢;, k = 1) we find that 


1. 1 \ 
Acsg (=? dit A} = lim, exp n (=, _ 5) =) ae 
‘tier ae Thao ox Tr? 2 
Therefore a.s. 


a = E| X55, | — Q). 


Thus we have shown that the martingale _X is not regular and it can 
be verified that it is not uniformly integrable. 


22.11. A martingale may converge in probability but not 
almost surely 


Recall that for series of independent r.v.s the two kinds of 
convergence, in probability and with probability 1, are equivalent 


(see e.g. Loéve 1978, Ito 1984, Rao 1984). This result leads to the 
following question for the martingale M = (M,, Jn, n > 1). If we 


know that 7, converges in probability as n — ©, does this imply 


its convergence with probability 1? (The converse, of course, 1s 
always true.) 


(i) Let (¢,, n = 1) be a sequence of independent r.v.s where 


Pié,= 41] =n). PIE,=O)=1l—n >. #221. 


Consider a new sequence (X,, 1 = 0) given by Xy = 0 and 


En; if A,-; = 0 
i, Gea er if Ae f 0, Th = iM 


Let Fn = o{€], ..., &)}. We can easily verify that the following four 


An ae 


statements hold: 


(a) X=(X,, Fn, n= 1) is a martingale; 
(b) for eachn > 1, X, = 0 1ff ¢, = 0; 
(c) P[X,=0] = Plé,=0]=1-n!; 
(d) PLX, #01.0.] = Plc, #0 1.0.] = 1. 


Note that statement (d) follows from the relation 
D> m=1 Pllén| = 1] = 00. 
We are interested in the behaviour of X, as n — oo. Obviously, 
P 
(c) implies that Xn — 0 as n — 0. However, (d) shows that P[o : 
X,(@) converges] = 0. Thus the martingale X converges in 
probability but not with probability 1. 


(ii) Let (¢,, n = 1) be a sequence of 1.1.d. r.v.s each taking the 
values £1 with probability 3. Define Fn = of, ..., &,} and let (B,, 


n > 1) be a sequence of events adapted to the family (Fn), that is 
B, € Fn for each n> 1 and such that lim,_,.. P(B,) = 0 and P(lim 


SUP)» B,) = 1. Consider the random sequence (X,, n = 1) where 
X 1 = 0 and 


Xn+1 — Xn(1 1 Sia) ie 1p, ane ea 
It is easy to check that X = (X,,, 7n, n> 1) is a martingale. Since 


P[Xna1 £0] < 4P[X, #0] +P(Bn) 
we conclude that 
Jim P(X, =0)=1, Plw: X,(w) converges] = 0. 
Therefore the martingale X is a.s. divergent despite the fact that it 
converges in probability. 


(iii) The existence of martingales obeying some special properties 
can be proved by using the following result (see Bojdecki 1977). 
Let the probability space (Q, F, P) consist of Q = [0,1], F = B[0,1] 
and P the Lebesgue measure. For any sequence (¢,, n => 1) of 
simple r.v.s (Gj 18 simple if it takes a finite number of values) there 


exists a martingale (X,,, +n, n > 1) such that 
P\é, = X,» for all sufficiently large n| = 1. 


Recall that there are sequences of simple r.v.s converging in 
probability but not a.s., and other sequences which are bounded but 
not converging. Thus in these particular cases we come to the 
following two statements. 


(a) There exists a martingale (X,,, 4n, n > 1) such that 


ee 50 as n—0o but Piw : X,,(w) converges to 0] = 0. 


(b) There exists a martingale (X,, Jn, n> 1) such that 


Plw : (X,,n > 1) is bounded] = 1 but Plw : X,,(w) converges] = 0. 


22.12. Zero-mean martingales which are divergent with a given 
probability 

(i) Let (¢,, 1 = 1) be a sequence of 1.1.d. r.v.s with Ec) = 0 and E| 

61| > 0. Take another sequence (7,, n = 1) of independent r.v.s with 

Enn = 0, E[n7] — n,n > 1, and consider the two series, 

din=i 60 and don=i Mn: According to Chung and Fuchs (1951), the 

series Dima §n diverges a.s. On the other hand, the series 


ne converges by the Kolmogorov three-series theorem. 
Assume that (cn, n > 1) and (7,, n = 1) are independent of each 


other and take another r.v. Xg which is independent of both 
sequences (¢,,) and (7,) and is such that PLX¥p = 1] =p=1-PLXp 
=— |] where p 1s any fixed number in the interval [0,1]. Define the 
new sequence (X,, 1 = 1) as: 


Xn = En I (Xo = 1) 5 Mnt (Xo = eh 
Let Fn = o {X}, ... X,} and put Sn = Veni Xk: Then (Sp, In, n> 


1) is a martingale with ES, = 0, n = 1. The question of obvious 
interest is what happens to the sequence (S,,) when n — oo. Since 


n n 
i I (Xo = 1) > & Ie I(Xo = 1) > nk 
k=] 


k=1 

it follows that 

P/S,, converges] = P(X) = -1])=1-—p, P(S, diverges] = P[Xo = 1] = p. 
(ii) Let (w, t= 0) be a standard Wiener process on (Q, F, P) which 


is adapted to the given filtration (Fi, t > 0) where 70 = {0,} and 


= Vi>o Fe. Let us take an event A € F with 0 < P(A) < 1. Define 
the random sequence 


(), ifweaA 
ais = Xn (w) eee it tae AS, n> 1. 
Then (X,,, 1 > 1) is a martingale with respect to the filtration (Fn, n 
> 1). This is a simple consequence of the martingale property of 


the Wiener process. Indeed, for any n > m we have a.s. 
EX, |F | = Eltal( 4554) = AEF] =A" oe = Xm 
Furthermore, it is well known (see Freedman 1971) that 
P im sup ww, 00 20] = =<, F Jim int w= -0o #0 
moo nh Oo 


From these relations we conclude that 


Plw : X,(w) converges as n — co] = P(A) 


where, to repeat, P(A) is a fixed number between 0 and 1. 


22.13. More on the convergence of martingales 


Here we present three examples of martingales XY = (X,, Jn, n > 1) 
which satisfy the condition sup,> 1 [X,| < © a.s. but have quite 


different behaviour as n — . It will be shown that _X may not be 
convergent, or convergent with a given probability (as in Example 
22.12), or a.s. divergent. 


(i) Let (€; & = 1) be independent r.v.s with P[¢,. = 2k 1] = Qvk 


and P[é; = —-1l] =1- 27k Defining t = inf {k : ¢; #—1} we find 
that 


P[r = oo] = |] (1-27*) > 0. 
eel 
Consider the sequence (X,, n > 1) and the family (Fn, n > 1) 
defined by 


Xn=(-D poate, Fo = Clb bah 21. 


k=1 
Then (X;,, Jn, n > 1) is a martingale and for X* = sup,>1 |X,| we 


have 
Bx sos >, U2 ee 
k=n-+1 
Hence for all n> 1, we have 2” PLX* > 2”] < 1 and APLX* > A] <2 
for arbitrary A > 0. 


Thus we have shown that X* < oo a.s. However, on the set [t = 
oo] which has positive probability, X,, alternates between | and 0, 


and hence (X,,) does not converge as 1 — ©. 
(ii) Let (¢,, 1 = 1) be independent r.v.s such that 
Plé2, — 1] — 1—n-* = 1—Pléo, = —(n* — 1), 
Plfon-1 = —1] = 1- n7=1- Pia. = tie Lig: ied. 
Obviously Eé, = 0 for all n > 1. By the Borel-Cantelli lemma 


P| 2n+1 fe Fon = 0 i.0.| =(0 and Pea! == il 1.0.| =), 
Let S, = é, +... + & and Fn= of€), ..., &)}. Define the stopping 
time 
P= inane leg) se Lt 
where m = m(p) is chosen so that PIU mtifnl # 1] < 1 —P for 
some fixed p, 0 < p< 1. Finally, let Xn = Sran-. Then it is easy to 
check that (X,,, n, n > 1) is a martingale and 
A= Seite > a) Sable S72). 


Let us note that S,,/(c > n) is either 0 or +1 or —1, and so for each n 


Xn] < 1+ |S-|L(r < 00). 
However, X;,, = S,, on the set [t = 0] and thus S,, diverges a.s. since 


its ¢; summands alternate between 1 and —1 for all large n. 
Therefore 


P|X,, diverges as n — oo] > P[r = co] > p. 
(iii) What is the limit behavior of a martingale {X,, Jn, n = 1,2,...! 
whose differences are bounded, 1.e. |X,4) — X)| <M < © as.? 
Define two events C = j{lim, Xj, exists and is finite}, D = 
{lim, Xn, = +00 and lim, X, = —oo}.. Then, as shown in Durrett 
(1991), we have P(C U D) = 1. 


Is this kind of property valid if we replace the boundedness 
condition above by a weaker one, e.g. by sup |X,,| < 00? To answer, 


consider a sequence U}], Up, ... of 1.1.d. uniform r.v.s on (0,1) and 
define a Markov chain (X,, n = 1,2, ...) starting from position X] = 
0 and evolving for n= 1, 2.,... as follows: 


1, if X,=0,Unsi> 


Deere _ —], ut ae — (), Unt <3 
i. . Sf eS 0) Bean 


3S ple ple 


With Jn = o(X}, ..., X,), we have 


| | a ee 
E(Xn41|Fn) = 5 1{Xn=0} = 5 Xn=0} i n’Xn—~5 = Xp 


and hence (X,, Jn, n = 1, 2, ...) is a martingale. Since 
Yin=1(l/n*) < ©, the Borel-Cantelli lemma implies that P(X, = 
a 1.0.) = 1 for a = —1,0 and 1 and that sup [X,| < «, 1.e. the 
martingale will eventually ‘oscillate’ from the initial position 0 to 


+] and back to 0. 


22.14. A uniformly integrable martingale with a nonintegrable 
quadratic variation 


Suppose M = (M,,, n = 0,1,...) is a uniformly integrable martingale. 


Then the series dun>1 AnM of the successive differences A,M = 


M,, — M,-1 (Mo = 0) 1s L! convergent. A natural question 1s 1f LI. 
convergence also holds for all subseries rund) UnAnM called 
Burkholder martingale transforms. Here v, € {0,1}, = 1. 

Dozzi and Imkeller (1990) have shown that the integrability of 


to 2 ee | / Ag\2 1/2 | 
the quadratic variation ~ (M) := tdunsilAn M)"} implies that 


all series dun>i VnAnM are L! convergent. Moreover, if S(M) is 


not integrable, then there is a sequence {v,, nm => 1} such that 


nk UnAnM is not integrable. 
Let us describe an explicit example of a uniformly integrable 
martingale M with a nonintegrable quadratic variation SCM) and 


construct a nonintegrable martingale transform Lund! UnAnM. 

Consider the probability space (Q, F, P) where Q =[1, «), F is 
the o-field of the Lebesgue-measurable sets in Q and P(dw) = 
ce—“dw. Here c = e is the norming constant and P corresponds to a 
shifted exponential distribution exp( 1). Introduce the r.v. M, and 
the filtration (F7, k= 1, 2, ...) as follows: 


Moo = ew *, w ED; F, = a0((1,k]) Vv {[k,00)}, & > 1. 
Since M,, is integrable, the conditional expectation E[Moo|%;] 1s 
well defined and is F;-measurable for each k > 1. Hence with Mj; 
:= E[M,|F x] we obtain the martingale M = (Mz, Fk, k= 1). Let us 


derive some properties of M. For this we use the following 
representation of M: 


My,(w) = e’w "1p py) (w) te*A "Lp (w), > 1. 
For A € o([1, é]) this is trivial, and 


/ Megil = | e“w 2e ” dw = e"k 'P((k, 0c) = | M, dP. 


Similar reasoning shows that M is uniformly integrable. The next 
property of M is based on the variable ({@] is the integer part of w) 


M*(w) := sup |M;(w)| = e! [vw]. 
k>1 


Obviously M* is not integrable, that is M* ¢ L!(Q, ¥, P), and the 
Davis inequality (see e.g. Dellacherie and Meyer (1982) or Liptser 
and Shiryaev (1989)) implies that SM) € L1(Q, F, P). 

Thus we have described a uniformly integrable martingale 
whose quadratic variation 1s not integrable. 

It now remains for us to construct a sequence (vz, k = 1), vz € 


{0,1$, such that the partial sums Nn = Vopn1 UeAKM are as. 


convergent as n — oo but not L| convergent. 
Since Mp := 0, then AjM(@m) = My(@) = e and choosing 


gen Seale _41\k , 
Un = 5 Oe ey lee asp Ak > 1, and using the above representation of M 


we easily find that 


n ol gtk eth gtk-I 
Non(w) =~ (5 Fe a Link —1,2k) (w) + (= 7s | Lat) } 


k=1 
This shows in particular that Noo ‘= liMn—oo Nn exists a.s. If we 
write explicitly N2,(@)l(2727+1)(@) for 2 < n and denote 


B = Upp [21, 20 + 1), then we see by a direct calculation that 


00 atk e2k-1 2141 
ire! —— > S- a es lp eda Soe: 
/ o 2k Qk —-1, 


hog De 2l 


Therefore the martingale transform (N,, n = 1) 1s not Li. 


convergent. 
It 1s interesting to note the case when 


My = Yipai Xk % = 1, with Xj; independent r.v.s, EX; = 0, k 
> |. Here uniform integrability of (/,,, n = 1) implies integrability 
of the quadratic variation S(M). 


SECTION 23. CONTINUOUS-TIME MARTINGALES 


Suppose we have given a complete probability space (Q, F, P) and 
a filtration (Jz, t > 0) which satisfies the usual conditions: Fz C F 
for each ¢; if s < ¢ then Ja Je (9a) is right-continuous; each J: 
contains all P-null sets of +. As usual, the notation (X; Ji, t > 0) 
means that the stochastic process (X; t= 0) 1s adapted with respect 
to (Fi), that is for each ¢, X; is Jt-measurable. 

The process X = (X; Ft, t = 0) with ELX;| < o for all ¢ > 0 is 
called a martingale, submartingale or supermartingale, if s < t 


implies respectively that 
EX laa) — Aas), BAG |S) Sas, OF BI |S) Aas, 


We say that the martingale M = (M, Ji, t > 0) is an L/- 
martingale, p => 1, if E[|X;?] < © for all t> 0. If p = 2 we use the 
term square integrable martingale. 

A tv. T on Q with values in R* U {00} is called a stopping time 
with respect to (Fz) (or that T is an (F¢)-stopping time) if for all ¢ € 
em’, (7<7er. 

Let X = (X; 41, t> 0) be a right-continuous process. X is said to 
be a local martingale if there exists an increasing sequence (7,, n 
> 1) of (F2):-stopping times with In —> 00 as n — oo such that for 
each n the process (Xtar,,,F¢, 2 9) is a uniformly integrable 
martingale. Further, X is called locally square integrable if 


(XtaT,» Ft,¢ 2 9) are square integrable martingales, that 1s if for 
each n, E[Xinr,,] < ©. 

It M= (My; J+, t > 0) is a square integrable martingale, then there 
exists a unique predictable increasing process denoted by (M) = 
(Mp, J+, t > 0) and called a quadratic variation of M, such that 
(My —(M),,F2,t = 0) ig a martingale. 

Suppose X = (X;, Jz, t > 0) is a cadlag process (that is, X is right- 
continuous with left-hand limits) where the filtration (J+) satisfies 
the usual conditions, and assume for simplicity _ that 
Fo- = Fo, F0.- = F. The process X is said to be a 
semimartingale if it has the following decomposition: 


X,=X 9 tM,+ A: t>0 
where M = (M,;, Jt, t > 0) is a local martingale with My = 0, and A 
= (.A,, F4,t> 0) is a right-continuous process, 4g = 0, with paths of 
locally finite variation. 

A few other notions will be introduced and analysed in the 
examples below. 

A great number of papers and books devoted to the theory of 
martingales and its various applications have been published 
recently. For an intensive and complete presentation of the theory 
of martingales we refer the reader to books by Dellacherie and 
Meyer (1978, 1982), Jacod (1979), Metivier (1982), Elliott (1982), 
Durrett (1984), Kopp (1984), Jacod and Shiryaev (1987), Liptser 
and Shiryaev (1989), Revuz and Yor (1991) and Karatzas and 
Shreve (1991). 

For the present section we have chosen a few examples which 
illustrate the relationship between different but close classes of 
processes obeying one or another martingale-type property. In 
general, the examples in this section can be considered jointly with 
the examples in Section 22. 


23.1. Martingales which are not locally square integrable 


We now introduce and study close subclasses of martingale-like 
processes. This makes it necessary to compare these subclasses and 
clarify the relationships between them. In particular, the examples 
below show that in general a process can be a martingale without 
being locally square integrable. We shall suppose that the 
probability space (Q, F, P) is complete and the filtration (F¢, t > 0) 
satisfies the usual conditions. 

(i) Let us construct a uniformly integrable martingale XY = (X;, Fz, t 
> 0) such that for every (F2)-stopping time 7, T is not identically 
zero, we have E|X z] = 0. Obviously such an_X cannot be locally 
square integrable. 


Let 2 = R*,F = B" and Fi be the o-field generated by t A f 
where T is ar.v. distributed exponentially with parameter | : P[t > 


x] =e *, x >0. Moreover, ¥ and J+ are assumed to be completed 
by all P-null sets of . According to Dellacherie (1970) the 
following two statements hold. 


(a) (J¢) is an increasing right-continuous sequence of o-fields 
without points of discontinuity. 
(b) The r.v. T is a stopping time with respect to (4+) iff there 


exists a number u € R* U {co} such that T>7a.s. on the 
set [t <u] and 7 =u a.s. on the set [t > uw]. 


Thus for each stopping time T with P[7 = 0] < 1 there exists u € 
R’ U {0} such that tA u = Tas. 


. o we 28 —1/2 Tre tied 7 
Consider now the rv. 4 =7 e!Lo<r<] Obviously we 
have 


J 1 
r —]/2 ¢/2 —97 4. _—-l/?.—a/2? 3. 
BZ = | p14 e%/2e “dx = | pee / dg < oo: 
0 J0 


So Z is an integrable r.v. Take the process X = (X;, t = 0) where 


Then X is a right-continuous martingale which is uniformly 
integrable. 
The next step is to check whether _X 1s locally square integrable. 


To see this, we use the following representation found by Doleans- 
Dade (1971): 


E|Z 1-5 4| 


Vein ee. 
f= 21 <r Pir> [r>t] 


Further, for every a € (0,1) we have 


NS ee lie<rna] =T ‘e "loer<a] = EX. 2 | x tee * dr = 00. 
J0 


Now let 7 be a stopping time such that P[7 = 0] < | andae 
(0,1) so that t\. a < Tas. Then the inequality E[X7] < ©., which 
is necessary for square integrability, is not possible because this 
would imply that E[X- pq] < E[X7] <0© which leads to a 
contradiction. 

Therefore the martingale _X is not locally square integrable. 


(ii) Let the r.v. t be the moment of the first jump of a homogeneous 
Poisson process N = (N, ¢t = 0) with parameter |. Define the 
filtration (Fz, tf => 0) where Jt = o{N, s < t} and the process m = 
(m;, t= 0) by 


mt = aia ee —2VTAt. 


According to Kabanov (1974), the process m has the following 
representation as a Stieltjes integral: 


ef 
i | gr lir>s] (dN, — ds). 
0 


It can be derived from here that (m;, Jt, t > 0) is a martingale. It 
also obeys other properties but the question to ask is whether m 1s 
locally square integrable. To answer this we again use the result of 
Dellacherie (1970) cited above. So, take any (F2)- stopping time T. 
Then 7A t=c AT for some constant c and for any c > 0 we have 
— ey] _ 9 

Bir Tir<e] = %- Hence Elm7] = © and the martingale m 1s not 
locally square integrable. 

(iii) Let € be ar.v. defined on (Q, F, P). Consider the process M = 
(M,, t > 0) and the filtration (F%, t > 0) given by 


<1 og _ J {0,0}, it Oat 
to |) F =ofé}, if t>1. 


In addition, suppose that E|E| < oo but E[E7] = o. Then it is easy to 
verify that (M;, Fi, t > 0) is a martingale. Following the definition 


we see that this martingale, which is also a local martingale, is not 
locally square integrable. 


23.2. Every martingale is a weak martingale but the converse is 
not always true 


Let M = (M,, Ji, t > 0) be a stochastic process. We say that M is a 
weak martingale if for each n there exists a right-continuous and 
uniformly integrable martingale “@" = (/?’.F+.t > 9) such that 
M; = Mi for 0 <t< T 7» Where (7), n = 1) is an increasing 
sequence of (J%)-stopping times with Zn —> °° as n > o. It is 
convenient to say that a stopping time T reduces a right-continuous 
process M = (M, Yi, t > 0) if there exists a uniformly integrable 


martingale H = (H;,, Fi, t > 0) such that M; = H; for 0<t< T. 
It is obvious from the above definition that every martingale and 


every local martingale are also weak martingales. This observation 
leads naturally to the question of whether or not the converse 
statement is correct. The answer is contained in the next example. 

Let z = (a;, t => 0) be a Poisson process with parameter A > 0, 10 
= 0 and (:, t > 0) be its own generated filtration: 
F, = Fy = o{Ts,8 < tf. Let t be the first jump time of z so T 1s 
an exponential r.v. with parameter A. An easy computation shows 
that 


eS if fe + 


Er A= fe a tase 


This relation will help us to construct the example we require. 
Indeed, for a suitable probability space, consider a sequence of 


such independent Poisson processes 2” = (m;',t > 0),n> |, where 
zw has parameter A,, and suppose that 1,, > 0 as n — o. Let 1, be 
the first jump time of the process z,. Denote by F the o-field 
generated by the r.v.s Ts for all n and s <¢ and including all sets of 
measure zero. Thus the family (F1,t 2 0) is right-continuous. 
Consider the process “/ = (M:, F4,t > 0) where M, = t. Using the 
independence of the processes z” we obtain analogously that 


a= ee et 
ee) eg TE Sea, 


TL 


El7, — A 


Tt 
This relation shows that t, reduces M. If we take, for instance, A,, 


= n-3 then the series 2nPI™ S$ 7) = L,(1-e"™") converges 
and the Borel-Cantelli lemma says that Tr $00 as n > o. This 
and a result of Kazamaki (1972a) imply that the process M is a 
weak martingale. However, M is not a martingale, which is seen 
immediately if we stop M at a fixed time uw. 

Therefore we have described an example of a continuous and 
bounded weak martingale which is not a martingale. 


23.3. The local martingale property is not always preserved 
under change of time 


Again, let (Q, ¥, P) be a complete probability space and (4%, t > 0) 
a filtration satisfying the usual conditions. All martingales 
considered here are assumed to be (4:)-adapted and _ right- 
continuous. 

By a change of time (1, Jt, t > 0) we mean a family of (Fi)- 
stopping times (z;) such that for all @ € Q the mapping t.(@) is 


increasing and right-continuous. 
If X = (% Ft, t > 0) is a stochastic process, denote by 


X = (X,,,57,,¢ 2 0) the new process obtained from X by a 
change of time. So if X obeys’ some _ useful 
E[t — A, |F] = i: et lite - 

a “= "property, it is of general 
interest to know whether the new process xX obeys the same 
property. In particular, if X is a martingale or a weak martingale we 
want to know whether under some mild conditions the process X 
is a martingale or a weak martingale respectively (see Kazamaki 
1972a, b). Thus we come to the question: does a change of time 
preserve the local martingale property? 

Let M=(M,, Jt, t> 0), Mg = 0 be a continuous martingale with 


Pilim sup My = oo] = 1. 


ico 
In particular, we can choose /M to be a standard Wiener process w. 
The r.v. t; defined by 7; = inf{u : M,, > tt is a finite (F1)-stopping 
time. Clearly, t9 = 0 and Tt, = © ass. It is easy to see that the 
change of time (t;, ¢ = 0) satisfies the relation Mt, = t which is a 
consequence of the continuity of M. However, the process 


M = (t, F,,,¢ > 0) is not a local martingale. 
Therefore in general the local martingale property is not 
invariant under a change of time. Dellacherie and Meyer (1982) 


give very general results on semimartingales when _ the 
semimartingale property is preserved under a change of time. 


23.4. A uniformly integrable supermartingale which does not 
belong to class (D) 


Let ¥ = (% te R‘) be a measurable process. We say that X is 


bounded in L! with respect to a given filtration (Fi, t € R*) if the 
number 


|X |], = sup E[|X7| Le <oo] 


where sup is taken aver all (4:)-stopping times 1, is finite. If, 
moreover, all the r.v.s Xt/[7<.] are uniformly integrable, X is said 


to belong to class (D). Several results characterizing this class can 
be found in the book by Dellacherie and Meyer (1982). In 
particular, it 1s shown there that every discrete-time uniformly 
integrable supermartingale belongs to class (D). This leads 
naturally to the question of the validity of a similar result for 
continuous-time supermartingales. The example below shows that 
in the continuous case such a result does not hold. 


Let w = (wy, t€ R’) be a standard Wiener process in R? 


starting 
at t = 0 at a point x different from the origin. Take the 
superharmonic function f(y) = //y|, y € R? (this 1s just the so- 


called Newtonian potential) and consider the stochastic process X 


= (Xx, 1 € R*) where X; = h(w;). Our purpose now is to study the 
properties of the process X. Since / is a superharmonic function 
and the process w is a martingale, we conclude that_X is a positive 


supermartingale with respect to the filtration (Fi, t € R*) with Fi = 
O{W,, Ss < t}. Moreover, X has continuous trajectories. As the 
trajectories of w in R? diverge to infinity as t — 00 (see Freedman 
1971), At —+0 as t > « and we get X,, = 0. Using the explicit 
form of the distribution of w, we find that the expectation ELX;| is 
a continuous function of ¢ on [0, 0]. Moreover, for every sequence 


(t,) of elements of [0, co] converging to ¢ € [0, 0] we have 


rt 
X+,, —> Xt-, So the mapping t+ X; of [0, oo] into the space L! is 
continuous and since [0, 00] is compact, the r.v.s X; t € [0, 0] are 


uniformly integrable (see Dellacherie and Meyer 1978). 

Therefore the process X 1s a_ uniformly integrable 
supermartingale which is even continuous. It remains for us to 
check if X belongs to class (D). For this purpose we use the 
following result (see Johnson and Helms 1963). Let Z be a positive 
right- continuous supermartingale and let 


te Sey In ae) Sea 
Then Z belongs to class (D) iff limn—oo E[Z;,, [[r,,<ool] = 9. 

In our case the process X is continuous, X,, =n on the set [t, < 
oo] where t, = inf{t : XX; = wn} and_ obviously 
J [tn <00] Xr, UP = nPltm < oo]. On the other hand, t, = inf{¢: |w/ 
< 1/n} and we have 

Nae if izi= lin 
es { (Gled. AE ol Sie 
Hence nP[t,, < ©] = 1/|x| for sufficiently large n, nP[t, < 0] does 


not tend to 0 as n — o and according to the result of Johnson and 
Helms (1963) quoted above, the process _X does not belong to class 


(D). 


23.5. L?-bounded local martingale which is not a true 
martingale 


Recall that the process M = (M;,, Ji, t > 0) is called an L?- 
martingale, p = 1, iff it is a martingale and M; € L? for each t > 0. 
If sup; E[|M,|P] < 00 we say that M is L?-bounded. For simplicity, 


let Mo = 0. For p € [0, 0), M is called a local L?-martingale if 
there is a sequence {t,, 1 > 1} of (F1)--stopping times such that ¢, 7 
0 asin — o and for each n the process @" = (Mtaz,,,F2,¢ 2 0) 


is an L?-martingale. 
In Example 23.1 we established that there are martingales and 
local martingales which are not locally square integrable. 


Similarly, we shall show below that an L?- bounded local 
martingale need not be a true martingale. 


(i) Let w = (w, t= 0) be a standard Wiener process in r?. Leth: R 
3\s0} 4 R! be a function defined by A(x) = |x|! for x € R> \ {0} 
and let t, = inf{t > 0: |w| <a}. Then {t), n > 1} is an 
increasing sequence of (J2)-stopping times, Fp = Fy +, with T,, — 00 
a.s. as n — oo. The function / is harmonic in the domain R? \ {0} 
which obviously contains the domain D, = {x : |x| = n hy for each 
n > 1. Define a function g, on the closure Dn of D, by g(x) = 
E,[A(@qn)], x € D,, where E,, denotes the expectation given wo = x 


a.s. Since w 1s a strong Markov process (see Dynkin 1965; 
Freedman 1971; Wentzell 1981) with spherical symmetry, g,, 


possesses the mean-value property that its average value over the 
surface of any sufficiently small ball about x € D,, equals its value 
at x (see Dynkin 1965). This implies that g,, is a harmonic function 
in D, and it can be shown that g, 1s continuous in Dy, with 
boundary values equal to those of the function h. By the maximum 
principle for harmonic functions we conclude that g, = in Dr, for 
all n. Moreover, for n > 1, x € D, and each fixed ¢t we have the 
following relation: 


EB, |h(w-,, )|Fe] = 1p, <eh(wear,) + Lr, >eEx[|h(wr, )| Fe] as. 


The strong Markov property of w gives 


Es (hlw ) 92) — EBs. (Rls) Sg rass. 
on the set [t, > ¢]. So if we combine these two relations and take 


into account that g, =f in Dn we have the equality 


E, [h(w,,, )|F:| = hCweas,,) as. 
Recall now that the initial state of the Wiener process 1s wo # 
(0,0,0) and let w9 = x9. Then for all sufficiently large n we have xq 
€ D,. Thus we conclude that the process (A(w;,,,),t = 0) 1s a 


bounded martingale. This implies that (A(wt),t = 0) is a local 
martingale. So it remains for us to clarify whether this local 
martingale is a true martingale. We have 


E., [A(wo)| = £0 
and we want to find E,o [A(w,) |. If t> 0 and c > 2\xo| then 
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Here cy > 0 and cp > 0 are constants not depending on c and ¢. 


Obviously, if t > 00, c > o and cf "4 — 0 then Ey, [A(w)] — 0. 
Hence for all sufficiently large t we obtain 


E,, [h(we)| # Ez, [h(wo)). 
This relation means that the process (A(w;), t => 0) is not a true 
martingale despite the fact that 1t 1s a local martingale. 


A calculation similar to the one above shows that h(w;) € L for 


each ¢ and also sup/E[h2(w))] < oo, Therefore (A(w;),t = 0) is an L- 


bounded local martingale although, let us repeat, it is not a true 
martingale. It would be useful for the reader to compare this case 
with Example 23.4. 


(ii) Let us briefly consider another interesting example. Let _X = 
(X;, t = 0) be a Bessel process of order /, / => 2. Recall that X is a 


continuous Markov process whose infinitesimal operator on the 
space of twice differentiable functions has the form 


L: ae f—ld 

De? | Om de 
Note that if / is integer, X is identical in law with the process 
(\w(t)|, ¢ = 0) where w(t)| = (wi) +--+ wi)? ang 


((w1(0),.... w(t)),¢ = 0) 1s a standard Wiener process in Ri (see 


Dynkin 1965; Rogers and Williams 1994). 
Suppose X starts from a point x > 0, that is X¥9 = x as., and 


consider the process M = (M,, t> 0) where 


Vee, 
If Jt = o{X,, s < th then it can be shown that (MJ, t > 0) is a 


local continuous martingale which, however, is not a martingale 
because EM; vanishes when t — oo (compare with case (1)). On the 
other hand, E|/;| < © for any p such that p < {J — 2). Thus, if / 
is close to 2, p 1s ‘big enough’ and we have a continuous local 
martingale which is ‘sufficiently’ integrable in the sense that M 


belongs to the space L” for ‘sufficiently’ large p; despite this fact, 
the process M is not a true martingale. 


23.6. A sufficient but not necessary condition for a process to 
be a local martingale 


We shall start by considering the following. Let_¥ = (X, J, t > 0) 
be a cadlag process with Yq = 0 and A = (4; Jt, t > 0) be a 
continuous increasing process such that 4g = 0. Assume that for A 


er! the process Z* = (Zi, F,,t > 0) defined by 


ZA = exp(AX; — 5A" At) 
is a local martingale. Then_X 1s a continuous local martingale and 
A = (X). Here A = (X) is the unique predictable process of finite 


variation such that X2 — (X) is a martingale. (For details see 
Dellacherie and Meyer (1982) or Métivier (1982).) 

This result is due to M. Yor and 1s presented here in a form 
suggested by C. Strieker. It can also be found in a paper by Meyer 
and Zheng (1984). 

Now we shall show that the continuity of A and the condition Aq 
= 0 are essential for the validity of this result. 

Let X” = (XP, F2,t 2 0) be a sequence of centred Gaussian 


martingales such that X” has the following increasing process 
Av = (APF, b= 0). 


Q. ir t= € 
A, = { if t>c+1/n 

linear in between, c= constant. 
We now consider the limiting case as n — o. Referring the reader 
to a paper by Meyer and Zheng (1984) for details, we get 
Ay > Atand Xi > Xz weakly in the space D) where 4; = Zyp>¢) 
and X; = Clyt>c} with ¢ ar.v. distributed N(0,1). It 1s not difficult to 
check that for each A the process 4 A= (Zp, F1,t > 0) where 
Zp = exp(AX; — ar" At) is a martingale. However, neither 4 nor 
X is continuous. Moreover, if c = 0, the property Xo = 0 a.s. no 
longer holds. 


23.7. A square integrable martingale with a non-random 
characteristic need not be a process with independent 
increments 


Let V = (X;, Jt, t> 0) be a square integrable martingale defined on 


the complete probability space (Q, ¥, P) where the filtration (J+, ¢ 
> 0) satisfies the usual conditions. The well known Levy theorem 
asserts that if X is continuous and its characteristic (X) is 
deterministic, then X is a Gaussian process with independent 
increments (see Grigelionis 1977; Jacod 1979). 

Our purpose now is to answer the following question. Is it true 
that any square integrable martingale X with a non-random 
characteristic (X) is a process with independent increments? 

Let Q = [0,1], P be the Lebesgue measure and the o-field F be 
generated by the following three r.v.s 79, 71 and 77, where 


nm =O forallw Ee Q, 


, oJ u-h if we [0,5 
ie eee ee 


—2. if w € [0, 5) 
0, if w € [2, 2) 
m= 4 1-3/2, if w € [5, 3) 
L, if w € [2, 2) 
1+ /3/2, if we [2,1]. 


Denote by F; the o-field generated by the r.v. 7;,i = 0, 1, 2, and fix 
the points sg = 0, s1 = 1, sz = 2, s3 = «&. Consider the stochastic 
process X = (X;, t = 0) defined by 


X¢ rs De eT essay ts l = 0 
k=0 


and introduce the family (J+, ¢ > 0) of increasing and right- 


continuous sub-o-fields of ¥ where Ji = F k for t € [s;7,5741), k= 
0,1,2. It is easy to check that Y = (X;,, Jt, t > 0) is a martingale (and 
is bounded). Moreover, its characteristic (X) can be found 
explicitly, namely: 


Obviously the characteristic (X) is non-random. Further, the 
relations 


P[X; — Xo =1,X. -X, = 1] =0 A § = P(X — Xo = YP - M1 = 1 


imply that the increments of the process_X are not independent. 

Therefore we have constructed a square integrable martingale 
whose characteristic (X) is non-random, but this does not imply 
that X has independent increments. It may be noted that here the 
process X varies only by jumps while in the Levy theorem _X 1s 
supposed to be continuous. Thus the continuity condition 1s 
essential for this result. 

A correct generalization of the Lévy theorem to arbitrary square 
integrable martingales (not necessarily continuous) was given by 
Grigelionis (1977). (See also the books of Liptser and Shiryaev 
1989 or Jacod and Shiryaev 1987.) 


23.8. The time-reversal of a semimartingale can fail to be a 
semimartingale 


Let w = (w;, t= 0) be a standard Wiener process in r!. Take some 
measurable function / which maps the space c[0,1] one-one to the 
interval [0,1]. Define the r.v. t = t(@) = A({w,(w),0 < s < It) and 
the process X = (X;, t= 0) where 


We, fue fed 
m= fm i i < Bebe 
(Gos A toe Lee, 
Thus X 1s a Wiener process with a flat spot of length rt < | 


interpolated from ¢ = 1 to t = 1 +t. Since t is measurable with 
respect to the o-field o{X,, s < t}, it 1s easy to see that X is a 


martingale (and hence a semimartingale). 
Now we shall reverse the process_X from the time ¢ = 2. Let 


X= %-< for USES 


Denote by (F;) the natural filtration of X. Note that the variable z 1s 
J1-measurable, hence so is {Xt; 0 <¢< 1}, since it is the time- 


reversal h\(z). Thus J: = Fi for 1 < t < 2. This means that any 
martingale with respect to the filtration (Ft) will be constant on the 
interval (1,2) and any semimartingale will have a finite variation 
there. However, the Wiener process w has an infinite variation on 
each interval and therefore X has an infinite variation on the 
interval (1,2). Hence X, which was defined as the time-reversal of 
X, 1s not a semimartingale relative to its own generated filtration 
(Fz), According to a result by Stricker (1977), the process X 
cannot be a semimartingale with respect to any other filtration. 


23.9. Functions of semimartingales which are not 
semimartingales 

Let X = {Xp F t t => 0) be a semimartingale on the complete 
probability space (Q, F, P) and the family of o-fields (J+, t > 0) 
satisfies the usual conditions. The following result 1s well known 
and often used. If f(x), x € R! is a function of the space c*(r!) or 
f is a difference of two convex functions, then the process Y = (¥;, 
Jt, t > 0) where Y; = f(X,) is again a semimartingale (see 


Dellacherie and Meyer 1982). 

In general, it is not surprising that for some ‘bad’ functions f the 
process Y = f(X) fails to be a semimartingale. However, it would be 
useful to have at least one particular example of this kind. 

Take the function f (x) = |x|“, 1 < a < 2. Consider the process Y 
= f(X), that is y = |X|" and try to clarify whether Y is a 
semimartingale. In order to do this we need the following result 
(see Yor 1978): 1f X 1s a continuous local martingale, X9 = 0 a.s., 
then statements (a) and (b) below are equivalent: 

(a) X = 0; (b) the local time of X at 0 is L°=0. 

Let us suppose that the process Y = |X/% is a semimartingale. 
Then applying the It6 formula (see Dellacherie and Meyer 1982; 


Elliott 1982; Métivier 1982; Chung and Williams 1990) for 6 = 1/a 
> 1 we obtain 


X|=Y¥/ = 6 / Yel dY, + 48(8 -1) / ura, 


a) 0) 


and 


t vt 
bY = / lix. =o] d|X,| —= / Liy. =o) ay?) = 0), t > 0. 
JO JO 


Thus by the above result we can conclude that X = 0. This 


contradiction shows that the process Y = |X|* is not a 
semimartingale. 
The following particular case of this example is fairly well 


known. If w is the standard Wiener process, then |w|“, 0 < a < 7A 1S 
not a semimartingale (see Protter 1990). Other useful facts 
concerning the semimartingale properties of functions of 
semimartingales can be found in the books by Yor (1978), Liptser 
and Shiryaev (1989), Protter (1990), Revuz and Yor (1991), 
Karatzas and Shreve (1991) and Yor (1992, 1996). 


23.10. Gaussian processes which are not semimartingales 


One of the ‘best’ representatives of Gaussian processes is the 
Wiener process which is also a martingale, and hence a 
semimartingale. Since any Gaussian process 1s square integrable, it 
seems natural to ask the following questions. What is the 
relationship between the Gaussian and semimartingale properties 
of a stochastic process? In particular, 1s any Gaussian process a 
semimartingale? 

Our aim now is to construct a family Xt) ‘ of Gaussian 


processes depending on a parameter a such that for some a, X®) is 
a semimartingale, while for other a, it is not. Indeed, consider the 
function 


K'%)(s,t) = s(s*+t% —|s—¢|*), 8,6E€R", oe [1,2]. 


It can be shown that for each a € [1,2] the function K‘) is positive 
definite. This implies (see Doob 1953; Ash and Gardner 1975) that 
for each a e€ [1,2] there exists a Gaussian process, say 


, (a) + ; 
XY = (X;",t e R ), such that EX{® = 0 and its covariance 
(a! “a) el ry) 
function 1s E[Xs" xX," = K's, t). fen. 
The next step is to verify whether or not the process X® is a 
semimartingale (with respect to its natural filtration). It is easy to 
see that for a = 1 we have KOs, t) = min{s, t}. This fact and the 


continuity of any of the processes Xx) imply that XW) is the 

standard Wiener process. Further, if a@ = 2 we obtain that 
9 

x; = (€ where ¢ is ar.v. distributed N(0,1). Therefore in these 

two particular cases, a = | and a = 2, the corresponding Gaussian 


processes XY) and X2) are semimartingales. To determine what 
happens if 1 < a < 2 we need the following result of A. Butov (see 
Liptser and Shiryaev 1989). Suppose X = (X;, t = 0) 1s a Gaussian 


process with zero mean and covariance function I'(s,f), s, t= 0 and 


conditions (a) and (b) below are satisfied. 


(a) There does not exist a non-negative and non-decreasing 
function F' of bounded variation such that (I(¢, ft) + T(s, s) — 


2T(s, t))!/4 < F(t) — F(s), s <t. 


(b) For any interval [0, 7] ¢ R* and any partition 0 = tg < t) <... < 
tr = T with maxf(t4s1 - t) —- O we have 


n—1 


V2 oF ace 
k=0 (Paice -_ Xi, ) — 0 asn — oo. 


Then the process_X is not a semimartingale. 


Now let us check conditions (a) and (b) for the process XM. We 
have 


FRA TE EO (ses) = BRO ey ee Ee 


However, the function |t — so! 2 with 1 <a@<2 is not representable 
in the form F(t) — F(s) for some non-negative and non-decreasing 
F of bounded variation. So condition (a) is satisfied. Furthermore, 


; (a2) y(a)\2) _ 
for t > s we can easily calculate that E[(X;°" — Xs°’)*] = |t — |“. 
It follows that 
Sree @ 
(¢ 2 mp = v¥—1 
2 E[(X;,, i ms ) | a 1 pax (tet — fe) = 0 as Th > &® 


which implies the validity of condition (b). 


Thus the Gaussian process X(®) is not a semimartingale if 1 <a 
ais 


Therefore we have constructed the family Lyt@) l>a= 2} of 
Gaussian (indeed, continuous) processes such that some members 
of this family, those for a = | and a = 2, are semimartingales, 
while others, when | < a < 2, are not semimartingales. 

Consider another interpretation of the above case. Recall that a 


fractional standard Brownian motion By = (By(t), t € Ri) with 
scaling parameter H, 0 < H < 1, is a Gaussian process with zero 


mean, B(0) = (0 a.s. and covariance function 
r(s,t) = E[By(s)Bu(t)] = 3[ls)?? +t?" —|s—t)?"], s,t€R 


(compare 7(s, ¢) with K® (s, t) above) (see Mandelbrot and Van 
Ness 1968). 


1 
Hence for any H, 2 < H < 1, the fractional Brownian motion By, 


is not a semimartingale. 

A very interesting general problem (posed as far as we know by 
A. N. Shiryaev) is to characterize the class of Gaussian processes 
which are also semimartingales. Useful results on this topic can be 
found in papers by Emery (1982), Jain and Monrad (1982), 
Strieker (1983), Enchev (1984, 1988) and Galchuk (1985) and the 
book by Liptser and Shiryaev (1989). 


23.11. On the possibility of representing a martingale as a 
stochastic integral with respect to another martingale 


(i) Let the process X = (X;, t € [0, 7]) be a martingale relative to its 
own generated filtration (F- Pa e 0,0) Suppose M = (M,, t € [0, 


4 
T]) is another process which is a martingale with respect to (Fy). 
The question is whether M can be represented as a stochastic 
integral with respect to X, that is whether there exists a ‘suitable’ 


M, = f, y.dX 


function (@g,, s € [0, 7]) such that *, One reason for 


asking this question is that there is an important case when the 
answer 1S positive, e.g. when_X is a standard Wiener process (see 
Clark 1970; Liptser and Shiryaev 1977/78; Dudley 1977). 

The following example shows that in some cases the answer to 
the above question 1s negative. 

Consider two independent Wiener processes, say w = (w,;, t = 0) 


and v = (v,, t > 0). Let *¢ = fi Ws ds and F; = 7{Xs,8 < th 
Then (X); is F;"-measurable and since (*)t = i Ws ds it follows 
that “7 is F+_measurable. Hence the process M = (M,, t = 0) where 
M;= Wi — tis an L?-martingale with respect to the filtration (F ot 
> 0). Suppose now that the martingale M can be represented as a 
stochastic integral with respect to_X: that 1s, for some predictable 
function (H3(@), s > 0) with Elfo 4s 4(X)s] < 00 we have 


' L 
M, = NF H,dX5,t = 0. Since by the Ito formula we have 


: ae t Ans any , 
Mz = 2 Jy Ws dws it follows that 


t “t t 
M, = 2 | w, dw. = eG ee / ls Tiles 
0 


0 0 


These relations imply that 


t t - 
E 2 / w, dw, — / H.Weg des 
0 0 
t 
AK | we ds +E | Hw? ts 
0 0 


which of course is not possible. 

Therefore the martingale “ = (Mz, iia t 2 0) cannot be 
represented as a stochastic integral with respect to the martingale 
X. 

(ii) Let_X be a r.v. which is measurable with respect to the o-field 
Fy generated by the Wiener process w in the interval [0,1]. Clearly 
in this case_X is a functional of the Wiener process and it 1s natural 
to expect that _X has some representation through w. The following 
useful result can be found in the book by Liptser and Shiryaev 


(1977/78). Let the r.v. X be square integrable, that 1s ELX2] < 00, 
Suppose additionally that the r.v. XY and the Wiener process w = 


= 
|! 


(w(t),t € [0,1]) form a Gaussian system. Then there exists a 
deterministic measurable function g(t), ¢ e€ [0,1], with 


er 
Jo 9° (t) dt < © guch that 


(1) X = EX +f g(t)dw(t). 


a) 


We now want to show that the conditions ensuring the validity 
of this result cannot be weakened. In particular, the condition that 
the pair (X, w) is a Gaussian system, cannot be removed. Indeed, 
consider the process 


st 
: | h(w(s))dw(s), t € [0,1] 
0 


where /(x) = | if x = O and A(x) = —1 1f x < O. It 1s easy to check 
that (X;, ¢ € [0,1]) 1s a Wiener process. Therefore the r.v. X =X] 1s 


a Gaussian and +i-measurable rv. However, X cannot be 
represented in the form given by (1) with a deterministic function 


g. 


SECTION 24. POISSON PROCESS AND WIENER 
PROCESS 


The Poisson process and the Wiener process play important roles 
in the theory of stochastic processes, similar to the roles of the 
Poisson and the normal distributions in the theory of probability. In 
previous sections we considered the Poisson and the Wiener 
processes in order to illustrate some basic properties of stochastic 
processes. Here we shall analyse other properties of these two 
processes, but for convenience let us give the corresponding 
definitions again. 

We say that w = (w,, t = 0) is a standard Wiener process 1f: (1) 


wo = O0as.; 


(11) any increment w;— w, where s < tis distributed normally, N(0,t 
— Ss); 

(11) for each n > 3 and any O<?t) <tp<... <t, the increments wf 
— wt}, wt3 — Wt2,..., Wy — Wyy_] are independent. 

The process N = (N;, t = 0) is said to be a (homogeneous) 
Poisson process with parameter i, i > 0, 1f: (4) No = 0 a.s.; (41) any 
increment NV; — N, where s < t has a Poisson distribution with 
parameter A(t — s); (a1) for each n > 3 and any O<t] <b <:°--°°<t, 
the increments Ny - Ni, Ng - Ny,.--s Nin - Nip_] are independent. 

Note that the processes w and N can also be defined in different 
but equivalent ways. In particular we can consider the non- 
standard Wiener process, the Wiener process with drift, the non- 
homogeneous Poisson process, etc. Another possibility is to give 
the martingale characterization of each of these processes. The 
reader can find numerous important and interesting results 
concerning the Wiener and Poisson processes in the books by 
Freedman (1971), Yeh (1973), Cinlar (1975), Liptser and Shiryaev 
(1977/78), Wentzell (1981), Chung (1982), Durrett (1984), Kopp 
(1984), Protter (1990), Karatzas and Shreve (1991), Revuz and Yor 
(1991), Yor (1992, 1996) and Rogers and Williams (1994). 


24.1. On some elementary properties of the Poisson process 
and the Wiener process 


(i) Take the standard Wiener process w and the Poisson process NV 


with parameter | and let N = (Ni,t > 0) where Me = Nt—t is the 
so-called centred Poisson process. It is easy to calculate their 
covariance functions: 


Cols) mini st}; Cel(sit)—ministh, 3.6 0. 


Therefore these two quite different processes have the same 
covariance functions. 


Z } rN 
Further, if we denote J; = O{Ws, S < t} and aig: (= atN,, Ss < t} 
i f N : 
then each of the processes (wy, Fy , t => 0) and (N,, Fe t>O)isa 


square integrable martingale. Recall that for every square 
integrable martingale M = (M;,, Jt, t > 0) we can find a unique 


process denoted by (M) = ((M),;, t > 0) and called a quadratic 


variation process, such that M2 — (M) is a martingale with respect 
to (Ft) (see Dellacherie and Meyer 1982; Elliott 1982; Métivier 
1982; Liptser and Shiryaev 1977/78, 1989). In our case we easily 
see that 


(w\e=t and (N), =t. 
Again, two very different square integrable martingales have the 


same quadratic processes. Obviously, in both cases (w) and (NV) are 
deterministic functions (indeed, continuous), the processes w and N 
have independent increments, w is a.s. continuous, while almost all 
trajectories of NV are discontinuous (increasing step wise functions, 
left- or right-continuous, with unit jumps only). 

Therefore, neither the covariance function nor the quadratic 
variation characterize the processes w and N uniquely. 


(ii) The above reasoning can be extended. Take the function 


C(s,t)=e A845 ¢>0, A = constant > 0. 


It can be checked that C(s, ¢) 1s positive definite and hence there 
exists a Gaussian stationary process with zero-mean function and 
covariance function equal to C. We shall now construct two 
stationary processes, say_X and Y, each with a covariance function 
C; moreover, X will be defined by the Wiener process w and Y by 
the Poisson process N with parameter AX. 

Consider the process X = (X;, t => 0) where 


X;= e Pew (ae?*), 


Here a > 0 and f > 0 are fixed constants. This process _X is called 
the Ornstein-Uhlenbeck process with parameters a and f. It is easy 
to conclude that_X is a continuous stationary Gaussian and Markov 


process with EX; = 0, t= 0 and covariance function Cy(s, t) = ae 
b\s—t|_ So, if we take a = 1 and B= 22, we obtain Cy(s, ft) = ae 2Ms— 


f given at the beginning). 
Further, let Y= (Y;, t= 0) be a process defined by 


Y; = Yo(—1)™' 


i 
where Yo is ar.v. taking two values, 1 and —1, with probability 2 


each and Yo does not depend on N. The process y, called a random 
telegraph signal, is a stationary process with EY, = 0, t = 0 and 


covariance function Cy(s, t) = e ZAs—t Obviously, Y takes only 


two values, 1 and —1; it is not continuous and is not Gaussian. 

Thus using the processes w and N we have constructed in very 
different ways two new processes, X and Y, which have the same 
covariance functions. 


(iii) Here we look at other functionals of the processes w and N. 
Consider the processes U = (U;, t = 0) and V = (V;, t = 0) defined 
by 


ot 
U.= f Xods, Vi=v(N) 
/() 


where _X is the Ornstein-Uhlenbeck process introduced above and 
the function v 1s such that o(2n) = o(2n + 1) =n, n=0,1,... 

What can we say about the processes U and V? Obviously, U 1s 
Gaussian because it is derived from the Gaussian process _X by a 
linear operation. Direct calculation shows that EU; = 0, t= 0 and 


Crist) = = min{s, t} + ale? min{s,t} 4 @—Smax{s.t} _ 9~Bls—t|-1) 
it i 


Clearly, if we take / = 2, then for large s, ¢ and |s — t| we have 
Cyul(s, t) ~ mints, tt. So we can say that, asymptotically, the 
process U has the same covariance structure as the Wiener process. 
Both processes are continuous but some of their other properties 
are very different. In particular, U is not Markov and is not a 
process with stationary increments. 

Consider now the process V. Does this process obey the 
properties of the original process NV? From the definition it follows 
that V is a counting process which, however, only counts the 
arrivals to, fo, t6,... from N. Further, forO <t-h<t<t+t+hwe 


have 

P[Vien — Vi = 0] =e 7" + SAhe “(1 — e 7") 
which means that V 1s not a process with stationary increments. 
Moreover, it 1s easy to establish that the increments of V are not 
independent. Finally, from the relations 


lim P[Vitn = 1|Ve = 1, Vi-v = 0] = PLN, = Or I] 
and 


P[N;, = Oor 1) > P[Vign = 1[Y = 1 


we conclude that V is not a Markov process. 

Thus the process V, obtained as a function of the Poisson 
process N, does not obey at least three of the essential properties of 
N. Actually, this is not so surprising, since v as defined above is 
not a one-one function. 


24.2. Can the Poisson process be characterized by only one of 
its properties? 


Recall that we can construct a Poisson process in the interval [0,1] 
by choosing the number of points according to a Poisson 
distribution and then distributing them independently of each other 
and uniformly on [0,1] (see Doob 1953). 

We now consider an example of a point process, say S, on the 
interval [0,1] such that the number of points in any subinterval has 
a Poisson distribution with given parameter A, but the numbers of 
points in disjoint subintervals are not independent. Obviously such 
a process cannot be a Poisson process. 


Fix 4 > 0, choose a number n with probability e*A”/n!, n = 
0,1,... and define the d.f. F,, of the n points ¢},..., 4, of S as follows. 
Ifn #3 let 


PENS oes te) = 1] -.-Ln 
and if n = 3 let 
(1) F3 (21,22, 03) = £12923 
+€%1%2%3(@1 — £Q)*(x1 — 23)?(x_ — 13)"(1 — 21) (1 — 22)(1 — 23). 
It is easy to see that for sufficiently small ¢ > 0, F3 1s a C.F. 


Moreover, it is obvious that the process S' described by the family 
of d.f.s {F,,} 1s not a Poisson process. Thus it remains for us to 


show that the number of points of S in any subinterval of [0,1] has 


a Poisson distribution. For positive integers m <n and (a, b) © 
[0,1] we have 


(2) Gm.n(a,b) = P,, [exactly m of ty,...,tn € (a, b)| 
{n 
= ( )Palt ee bm = (a, b), bm+1s cnt bn a (a, b)| 


TH 


t1 is (ee 


where X,,(t) = 1 if t< a and X,(¢) = 0 if t= a. Moreover, since 


EL Xashbh) OP A gdb dl | SP yl Ciena lle) 
then in (2) only terms of the form F’, (a],..., @,) appear where for 
each 1, a; is equal either to a, or to b, or to 1. Hence if 


(3) PO fina tre 5 Ce, = ea 


for all such values of a},..., dy, then Gy (a, 5) in (2) will be the 


same as in the Poisson case. For n ¢ 3 this follows from the choice 
of Ff, as a uniform d.f. For n = 3, relation (3) follows from (1) and 
the remark before (3). 

The final conclusion is that the Poisson process cannot be 
characterized by only one of its properties even if this is the most 
important property. 


24.3. The conditions under which a process is a Poisson process 
cannot be weakened 


Let v be a point process on Ri, v(J) denote the number of points 
which fall into the interval / and |/| be the length of /. Recall that 
the stationary Poisson process can be characterized by the 
following two properties. 


P ———e 
(A) For any interval J, s k! ,k=0,]1, 2..... 


(B) For any number 7 of disjoint intervals /4,..., /,, the r.v.s 
v(/1),.--5 VU,,) are independent, = 2, 3,.... 


In Example 24.2 we have seen that condition (B) cannot be 
removed if the Poisson property of the process 1s to be preserved. 
Suppose now that condition (A) 1s satisfied but (B) is replaced by 


another condition which is weaker, namely: 


(B) for any disjoint intervals /; and /> the r.v.s v(/1) and v(/9) are 
independent. 


Thus we come to the following question (posed in a similar form 
by A. Reny1): do conditions (A) and (B7) imply that v is a Poisson 
process? The construction below shows that in general the answer 
is negative. 

For our purpose it is sufficient to construct the process v in the 
interval [0,1]. Let v be a Poisson process with parameter 4 with 
respect to a given probability measure P. The idea is to introduce 
another measure, denoted by P, with respect to which the process v 
will not be Poisson. 

Define the unconditional and conditional probabilities of v with 
respect to P by the relations 


i ke 


(1) P{v((0, 1]) = k] = P{v((0, 1]) = k] = ae k=0,1,2,.. 
(2) P{-{v((0,1]) =k] =Pl-|v([0,1]) =k), wk 45. 


If v({0,1]) = & and we take a random permutation of the / points 
of a Poisson process which fall into [0,1], then the distribution of 
the k-dimensional vector obtained is the same as the distribution of 
a vector whose components are independent and uniformly 


distributed in [0,1], that is its conditional d.f. “zk given v([0, 1]) =k 
has the form 
Fi,(v1,..., Lk) SPyet, where WS ty S le fF Sls caghi: 


From (2) it follows that for k #5 the d.f. F; of v about P satisfies 
the relations 


(3) F, = Fy. 


For k=5 and 0 S27 = 1,7 = 1...., 5 we define F's as follows: 


P5(4,...,%5)=01-++%5 +eay---%5(1 —21)--- (1-25) l| ea) 


Sic je 


= 's(21, si , 25) = (21, . ene, 


It is easy to check that for ¢ positive and sufficiently small the 
mixed partial derivative (@/dx} -- Ox5)F5(x],..., X%5) is a 
probability density function and thus F’s is a d.f. It is obvious that 


our process v, and also the measure P, are determined by (1), (2), 

(3) and (4). Moreover, (4) means that v is not a Poisson process. 
Clearly it remains for us to verify that the probability measure P 

satisfies conditions (A) and (B9). These conditions are satisfied for 


the measure P, so it is sufficient to prove that for disjoint intervals 
I, and [Ip we have 


P[v(I,) = ky, v(I2) = ke) = Plv(I,) = k1, v(Ie) = ke). 


By the definition of P we see that we need to establish the relation 
(5) Pv(1,)=ky, v(I2)=ke|v((0, 1])=5] = Plv(y)=k. v(Lp)=ko|v({0, 1))=5). 


The probability in the left-hand side of (5) is a finite sum of the 
form )(+F'5(@1,..., @5)) where each aj; is either 0, or 1, or the 


endpoint of one of the intervals /] or /y. So the difference between 
the two sides of (5) 1s equal to }\(+H(@],..., a5)). Obviously, each 
term in this sum is 0. This 1s clear if O or 1 occurs among the as; if 
not, then at least two of the as are the same, so H vanishes again. 

Therefore the measure P satisfies conditions (A) and (B2). This 
means that we have described a process v which obeys the 
properties (A) and (B 9), but nevertheless v is not a Poisson 
process. 


Condition (Bz) can be replaced by a slightly stronger condition 
of the same type, (B,,), which includes the mutual independence of 
the r.v.s v(/1),..., VU,,) for any M disjoint intervals /1,..., J, where, 


let us emphasise, M is finite. The conclusion in this case 1s the 
same as for M = 2. 


24.4. Two dependent Poisson processes whose sum is still a 
Poisson process 


Let X = (X(#),t = 0) and Y = ((t),t = 0) be Poisson processes with 
given parameters. It is well known and easy to check that if _X and 
Y are independent then their sum_X + Y = (X(0) + Y(),t = 0) 1s also 
a Poisson process. Let us now consider the converse question. X 
and Y are Poisson processes and we know that their sum _X + Y 1s 
also a Poisson process. Does it follow that X and Y are 
independent? The example below shows that in general the answer 
is negative. 

Let g(x, y) =e~* Y for x => 0 and y = 0, and g(x, vy) = 0 otherwise. 
So g is the density of a pair of independent exponentially 
distributed r.v.s each of rate 1. We introduce the function 


i W (Ose 13rd) or Ls oe 2) 

or 2 ees ey< ll) or b= ee 21 ae) 

nay) =< —o, ff O<7<1,2<97<3) o la r< 2a y< 4) 
“2g <3. Lee?) G@ Be ee 40S ge) 


Q, otherwise 
where a = constant, 0 <a< e~© and define 


f(a,y) =9(2,y)+filz,y), (z,y) €R’. 


It 1s easy to check that: (a) f 1s a density of some d.f. on R? ; (b) 
the marginals of f are exponential of rate 1 ; (c) for each non- 


negative measurable function / on 2 such that h(x,y) = h(y, x) the 


following equality holds: 


(1) fz vile ijandy— / g(x, y)h(a, y) dx dy. 
JR? JR? 

Now let Q = R2 x R? x - - - be the infinite and countable product 

of the space R? with itself such that Q = (RZ), Define W,(w) = 

(U,(@), V),(@)), n = 1, as the nth coordinate of @ € Q. Let A be the 


o-field generated by the coordinates. We shall provide (Q, A) with 
two different probability measures, say P and Q, as follows: 


(a) P is a measure for which {W,, n = 1} is a sequence of 
independent r.v.s, W; has density f and each W,, n > 2, 


has density g; 
(b) Q is a measure for which {W,, n = 1} is a sequence of 1.1.d. 


r.v.s each having the same density g. 


Put Up = Vo = 0 and define the processes X, Y and Z= X + Y 
where: 


n+l 


n, if SoU st <> Ue 


k—0 
n+l! 


= tou <t< SV, 


k=0 k=0 
Z(t) = X(t) + Y(t). 

{U,,n = 1} 1s a sequence of independent exponential r.v.s of rate | 
with respect to each of the measures P and Q. The same holds for 
the sequence {V,, n => 1}. Hence X and Y are Poisson processes 
with respect to both P and Q. Moreover, X and Y are independent 
for Q which implies that Z is a Poisson process for Q. 

The next step 1s to show that_X and Y are not independent for P. 


This will follow from the relation 


P[X (2) = 0, X(3) > 1, Y(1) > 1] = P[X(2) = 0, X(8) > 1P[Y(1) > 1) +a. 
Now let B € Q and B be the set of all points ((X1,V] seers (Xp Vy) 
--) such that ((v1, X1),.... (Vy, Xy), °°) € B. Using relation (1) we 
can prove that P(B) = Q(B) for any measurable subset B C Q such 
that 6 = B (for details see Jacod 1975). 

It remains for us to show that Z = (Z(t), t = 0) is a Poisson 
process for the measure P. Note first that each event B which 
depends only on the process Z (this means that B belongs to the o- 
field generated by Z) satisfies the equality 6 = B. Since Z is a 
Poisson process for the measure Q, Z must also be a Poisson 
process for the measure P. More precisely, if sj <t] <...<5, <t, 
we see from the above reasoning that 


ala ty.) — Z(Sz) = m3] <9 Nate tr) — Z(sp) = ng} 


“Il exp{—2(t, — 8%) }/2(t, — sp)]"* /ng! . 


Obviously this relation illustrates the fact that Z is a Poisson 
process with respect to the probability measure P. 

Note that the present example is 1n some sense an analogue to 
Example 12.3 where we considered an interesting property of 
dependent Poisson r.v.s. 


24.5. Multidimensional Gaussian processes which are close to 
the Wiener process 

Recall that w = ((w](Q),.... w(t), ¢ = 0) is said to be an n- 

dimensional standard Wiener process if each component w; = 

(w,(),t = 0), 7 = L...., 1s a onedimensional standard Wiener process 


and w],...,w, are independent. Further, the linear combinations 


Y(t) = » Aj;w;(t), t>0, A; € R! 
j=1 


are often called the projections of w and it 1s very easy to see that 


(1) E[Y(s)Y(t)] = { S>d3 | min{s, ¢}. 


j=l 


Suppose now X = ((X](0),..., X, (4), t= 0) 1s a Gaussian process 


may _ yon ae ae ees pl 
whose projections Z(t) = di ja1 Aj Xj (0), Aj ER , Satisfy the 
relation 


(2) E[Z(s)Z(t)] = | > 5 | min{s, ¢}. 


Comparing (2) and (1) we see that in some sense the projections 
of the process X behave like those of the Wiener process w. Since 


| 


in (2) and (1), Ay,..., A4,, are arbitrary numbers in R’, and s and ¢ are 


also arbitrary in R*, we could conjecture that XY is a standard 


Wiener process in R”. However, the example below shows that in 
general this is not the case. To see this, consider for simplicity the 
case n = 2. Take two independent Wiener processes, w] = (w1(Z),t 


> 0) and w9 = (w9(t),t = 0), and define the process X = ((X](0), 
X9(t)),t 2 0) by 


X(t) = wi (St) + we(St), Xo(t) = wi (St) — we(Ze). 


Then w = ((w1(0), w2(t)),¢ = 0) 1s a standard Wiener process in Re 


and for any A], Ao € r! the projections Y(t) = Aywy(t) + Agw(t) 


satisfy (1). Further, if we take the same 1, A2 we can easily show 
that the projections Z(t) = A1,X](t) + AdX9(t) of the Gaussian 
process X satisfy (2). However, this coincidence of the covariances 
of the projections of X and w does not imply that _X is a standard 


Wiener process in Rr. It is enough to note that the components X] 
and X> of X are not independent. 
Note that the Gaussian process X with property (2) will be a 


standard Wiener process in R” if we impose some other conditions 
(see Hardin 1985). 


24.6. On the Wald identities for the Wiener process 


Let w = (w(t),t = 0) be a standard Wiener process and t be an (F1)- 
stopping time. The following three relations 


(1) Ew(7) = 0, 
(2) E[w*(r)| = Er, 
(3) Elexp(w(r) — $7)] =1 


are called the Wald identities for the Wiener process. Let us 
introduce three conditions, namely 


(1*) E./T < 0, 
(2*) Er < oo, 
‘Bigs Efexp(47)] < oo. 


Note that (1*), (2*) and (3*) are sufficient conditions for the 
validity of (1), (2) and (3) respectively (see Burkholder and Gundy 
1970; Novikov 1972; Liptser and Shiryaev 1977/78). 

Our purpose here is to analyse these conditions. In particular, to 


clarify what happens to ((1), (2) and (3) when changing (1*), (2*) 
and (3*). 

Firstly, take the stopping time 1; = inf(t > 0 : w(t) = I}. By the 
continuity of the Wiener process w we have w(t 1) = | and hence 
Ew(rt1) = 1 but not 0 as in (1). However, the r.v. 7; has density 
(2nt3)-l/ - ae I/(2t)), a Q, and it is easy to check that E[r{] < 00 
for all 6 < 2 but E[7, = °°, so (1*) is violated. Obviously, 
identity (2) is also not satisfied because E[w2(z1)] =1ZEt,= 


Regarding the identity (1) we can go further. Among many other 
results Novikov (1983) proved the following statement. Let f (4), t 
> 0, be a positive, continuous and non-decreasing function such 


that 
| t~3/2 f(t) dt = oo. 
l 


Then for any Fy -stopping time t with E[f(t)| < 0 and E[|w(z)|] < 
we have Ew(rt) = 0. Let us show that the integrability condition for 
f cannot be weakened. 


Suppose | that f is positive, continuous and non-decreasing, f (O) > 
(2 
0 and Ir 1 t Wl" f(t) dt < oo, Consider the stopping time t7 = inf{t > 


0: w(t) => 1 - f (t)}. It can be shown (for details see Novikov 
(1983)) that 


E||w(72)|] < 00, El[f(72)] < co but Ew(72) > 0. 


Now consider condition (3*) and the identity (3). It is not 
difficult to show that (3*) cannot be essentially weakened. Indeed, 
define the stopping time 


T, =inf{t >0: w(t) < —1+a¢%} 


where a 1s an arbitrary real number. Since 7, has the density 


(Qnt?)~'/? exp[—4(—1 + at)?/t], 


it is easy to verify that Elexp(3a°Ta)] <0 for each a. 
Furthermore, 


| | 1, ifa> 1 
Elexp(w(t,) = 5Ta)| = 1a. 2 |. 4b ae 1. 


Here cg is a constant depending on a. Its exact value is not 


1 
important but it is essential that c, < 1. Therefore the coefficient 2 


in the exponent in condition (3*) is the ‘best possible’ case for 
which the Wald identity (3) still holds. 

The Wald identity (3) 1s closely connected with a more general 
problem of characterization of the uniform integrability of the class 
of exponential continuous local martingales (see Liptser and 
Shiryaev 1977/78; Novikov 1979; Kazamaki and Sekiguchi 1983; 
Liptser and Shiryaev 1989; Kazamaki 1994). (It is useful to 
compare (3) and (3*) with the description in Example 24.7.) 


24.7. Wald identity and a non-uniformly integrable martingale 
based on the Wiener process 


Let us formulate first the following very recent and general result 
(see Novikov 1996). Suppose X = (X, Jt, ¢ => 0) is a square 
integrable local martingale with bounded jumps (|AX; = |X; — X7 < 
constant a.s. for each f) and such that (X),, < 0 as. and ELX,| 


exists. Then 
(1) lim inf(VtP[(X) oo > t]) = 2/7 |E[Xo0] 


and in particular lim infr+co(VtP[(X) oo > t]) = 0 implies ELX,,| 
= (), 
From this result we can easily derive an elegant corollary for the 


Wiener process w = (w;, t > 0). Let t be a (F; )-stopping time such 
that t < co a.s. and E[w,]| exists. Then the process X; := w;,z7, T = 0 
is a square integrable local martingale (even continuous) with (X), 
= t and X, = w, and in this case (1) takes the form 


(2) lim n inf (Vt P[r > /f2/r|Elw, 


In particular, lim infisoo(VtP[r > t]) = 0 > E[w,] = 0. 
Example 24.6 shows that the Wald identity E[w,] = 0 does not 
hold for the stopping times t, = inf {t: wy= A}, A 1s a real number. 
Note however that Plra >t] © /2/a|Alt-/? toy large t, 
implying that lim inft+o0(VtP[ra > t]) > 0. 

Thus we arrive at the question: is there a more general 


martingale X satisfying the conditions in the above result of 
Novikov and such that 


(3) lim inf(VtP[(X) og > t]) =0 but limsup(VtP[(X),,. > t]) >0 


ae.) 


and, if so, what additional conclusion can be derived? 
Let us show by a specific example that both relations (3) are 
possible. Indeed, take the increasing sequence | = tj] <t9 <i3 <--> 


and define the function g(s), s > 0, where 


Ls i v= ¢< l=, 
ry l/s, if t; <s < t,4, for oddé 
2 Le. 2s St ee TOC SVEN TE, 8 tl Feces 
Introduce the following two stopping times: 
Foes mnt wy 1 ond & Sin ee OF 
and define the process m = (m;, t= 0) by 


LAT 
— / g(s) dws. 
0 


Then m is a square ae a es, which is continuous 


and such that ("7/2 = G(t = fo 9 8) ds and (M)oo = G(t) < 1 a.s. 
Moreover, the relations 


mao = f g(s) dw, = 2w, + +f (2—g(s))dw,, wp = 0 as., 
Jo 


[ (2 — g(s))*ds < 1+ / (1/s)? ds <2 
J () 


i 


imply that m,, is integrable. The next step is to check that for large 


t we have P[t > ¢] = cp t2 and, since G(t), t = 0 is strictly 
monotone (due to the special choice of g above), there is an inverse 


function G~! and 

P[(m)oo > t] =P[r > GN (t)] w= e(G 1 (t))-/. 
Thus we conclude that 

lim inf(VtP[(m) oo Si)=0 (=> Elms] = 0) 
While 


(4) lim sup(Vt P[(m). > t]) > 0. 
too 
It should be noted that (4) is a sufficient and necessary condition 
for the process m to be non-uniformly integrable (see Azema et al 
1980). Therefore we have described a continuous square integrable 
local martingale m = (m;,, t = 0) with E[m,] = 0 but despite these 
properties, m 1s not uniformly integrable. 


24.8. On some properties of the variation of the Wiener process 


(i) Let us consider the Wiener process w 1n the unit interval [0,1]. 
For any fixed p > | let 


n—1 


Vp>(w) = sup y. Jwltes1 — w(te)|” 


Sm 
# 


™ k=0 

where sup 1s taken over all finite partitions z, = {0 =19 <t, <...< 
tn = 1} of [0,1]. The quantity V,(w) 1s called a p-variation (or 
maximal p-variation) of the Wiener process in [0,1]. Let us also 
introduce the so-called expected p-variation of w as E[V,(w)]. 

We are interested in the conditions ensuring that V,(w) and 
E[V,(w)] take finite values. It 1s better to consider an even more 
general situation. 

Suppose_X = (X(t),t € [0,1]) is a separable Gaussian process with 
EX(t) = 0, t € [0,1] and let ey(s, t) = ELX(s) — X(0). Firstly, 
according to the 0—1 law for Gaussian processes, the probability 
P[ V(X) < ©] 1s either 1 or 0 (see Jain and Monrad 1983). Further, 
it can be shown that if P[V,,(X) < «] = 1, then it 1s also true that 
E[V p(X)| < «© (see Fernique 1974). Since 


n—|] 
sup E ps IX (te41) — X (te)|P 
se k=0 

n—l 
Cy SUP a ex (Ex, te+1) 


o 


"mt b—- 


IV 


n—] 
£ [sup X(t) — X16 
"Skah 


IV 


we conclude that the condition 


n— | 


(1) sup ye vite, tei) < co 


Tm p—q 


is necessary for the Gaussian process X to have trajectories of 
finite p-variation with probability 1. 


Take the particular case p = 1. The equality 


n—l n—l 


E |sup >) |X (te+1) — X(te)|| = sup) ex (te, tet) 
k=0 k=0 


TT Th e 7 Ti 


shows that if p = 1, then condition (1) 1s also sufficient to ensure 
that the variation V1(X) of order 1 is finite. If p > 1, condition (1) is 


not sufficient to ensure that V(X) < « a.s. To demonstrate this, 


consider the Wiener process w again. In particular, for p = 2 we 
have 


n—1 


} ' | 
sup ) ey (te, tga) <= CO 
a 


= et 

that is, (1) is satisfied. However, the Wiener process w has an 
infinite variation on every interval. Therefore the finiteness of the 
expected p-variation, p > 1, does not in general imply that the 
trajectories of the process have a.s. finite p-variation for p = I. 


(ii) Let us now consider some properties of the guadratic variation 
V> (w, 2) of the Wiener process w, which is defined by 


n—l 


(2) Vo(w, mn) =) [w(te+1) — w(te)]?. 


k=0 


It is useful to recall the following classical result (see Levy 1940). 
If the partition z, is defined by {k2-", k = 0,..., 2”} then with 
probability 1 


ie k+ 1 k _ 
itn) = ; w 7 —w a +1 asn—-oo. 


k=0 


(Note that the limit value | is simply the length of the interval 
[0,1].) Obviously, in this particular case the diameter of z,, is d,, = 


2" which tends to 0 ‘very quickly’ as n — o. Thus we come to 
the question of the limit behaviour of V>(w, z,) as n — © and d, 


— 0. Dudley (1973) proved that the condition d, = o(| / log n) 
implies that V2(w, 1m) —>1 as n — oo. Even in more general 
situations he has shown that o(I/ log n) is the “best possible’ order 
of d,. More precisely, there exists a sequence {z,} of partitions of 


the interval [0,1] with d, = OU1/log n) and such that V>(w, z,,) does 
not converge a.s. to 1 as n 0; V>(w, z,,) will converge a.s. to a 


number which is (strictly) greater than 1. A paper by Fernandez De 
La Vega (1974) gives details concerning the construction of {z,} 


with d, = O(1/ log n) and proof that the quadratic variation V>(w, 
1,) converges a.s. to a number | + 0 where o > 0. 


(iii) Finally, let us mention another interesting result. It can be 
shown that if the diameter d, of the partition z, of the interval 


[0,1] is of order less than (1/ log n)“ for any 0 < a < 1, then the 
quadratic variation V7(w, z,,) of the Wiener process w diverges a.s. 


as n — oo. For details we refer the reader to a paper by Wrobel 
(1982). 


24.9. A Wiener process with respect to different filtrations 


The Wiener process w = (wy, t > 0) obeys several useful properties. 
One of them is that w is a martingale which, moreover, 1s square 
integrable (see Liptser and Shiryaev 1977/78; Kallianpur 1980; 
Durrett 1984; Protter 1990). 

Recall, however, that for some martingale M = (4; t > 0) we 
mean that M is adapted with respect to a suitable filtration (Ft, t > 
0), that is for each t > 0, M; is Ft-measurable. In the case of the 
Wiener process w we can start with some of its definitions and 
establish that w is a martingale with respect to its own generated 
filtration (F;,¢ = 9): Fe = of{ws,s < tf Note that in general a 


process can be adapted with different filtrations; in particular, a 
process can be a martingale about different filtrations. Hence it 1s 
interesting to consider the following question. What is the role of 
the filtration and what happens if we replace one filtration by 
another? One possible answer will be given in the example below. 

Let (Q, J, P) be a probability space and let (Xie, t > 0) and (%¢, t 
> 0) be two filtrations on this space. Suppose w = (w, ¢t = 0) is a 
Wiener process with respect to each of the filtrations (Xe, t > 0) 
and (di, t > 0). Now let us define a new filtration, say (F¢, t > 0), 
where J: = Xe V Jz is the o-field generated by the union of Xi 
with d+. How is the process w = (w, t > 0) connected with the new 
filtration (Jz, t > 0)? In particular, is it true that w is a Wiener 
process with respect to (J¢, t > 0)? Intuitively we could expect that 
the answer to the last question is positive. However, the example 
below shows that such a conjecture is false. 

Suppose we have found two r.v.s, say_X and Y, which satisfy the 
following three conditions: 


(a) X does not depend on the process w = (w,, t= 0); 

(b) Y does not depend on the process w = (w, t= 0); 

(c) the process w = (w;,, t= 0) and the o-field o(X, Y) generated 
by the r.v.s_X and Y are dependent. 


Now, denote Fy = O{W,, S <t} and define Xt and Yi as: 


pad, VON), Yea de violF }. 
It is easy to see that the new filtration (Jz, t > 0) where 
F, = Xt V Yeis such that Fe = Fe V O(X,Y). 

Clearly w = (w, t = 0) is a Wiener process with respect to each 
of the filtrations (Xt, t > 0) and (dt, t> 0). However, w = (w;, t= 0) 
is not a Wiener process with respect to (Jt, t > 0) which follows 
from condition (c). 


Hence it only remains for us to construct r.v.s_X and Y satisfying 
conditions (a), (b) and (c). For simplicity we consider the Wiener 
process w 1n the interval [0,1]. 


1 
Let_X be ar.v. with PLY = 1] = PLX =-— 1] = 2 and suppose_X 1s 

1 
independent of w = (w,;, 0 <t< 1). Define the r.v. y by Y = 2|X¥ + 
sign(w])|. Obviously the r.v. signai) 1s o(X, Y)-measurable and thus 


condition (c) is satisfied. Condition (a) is satisfied by construction. 
Let us check the validity of (b). For this, let0<?t) <tp<.-:<4,< 


1 be any subdivision of [0,1]. Then for arbitrary continuous and 


bounded functions f (x), x € Ri, ANG OG srt Ny) 1g Oe er! 
we have 
E[f(Y )o(we,,---, w2,)| = ELELS(Y )o(wi,,---, we, )|X, wil} 


= EUS(Y )Elg(we,,---, We, )|wi)t- 

Since E[2(w7+1,.... Wy,)|w |] 1s a (measurable) function of w1 only, it 
is sufficient to show that the variables Y and w are independent. 
Obviously, 1 and 0 are the possible values of Y, and the event [Y = 
QO] can occur only if w; € B where B © (-c, 0), while [Y = 1] 1s 
possible only if w; € B © (0, oc). Further, the relation [Y=0]M [w] 
€ B] =[X=-1]N [vw € PF] holds for any set B € B' and hence we 
have 


P{/Y = 0] N[w, € Bl} = sPlw, € Bi. 


| 
2 
Analogously, 
P{[Y = 1] [wi € Bl} = $P[wi € Bl. 
Therefore the variables Y and wy, are independent and so 
condition (b) is also satisfied. 


24.10. How to enlarge the filtration and preserve the Markov 


property of the Brownian bridge 


Let X = (X;t = 0) be a real-valued Markov process on the 
probability space (Q, 7, P) and let E|X;| < co for all ¢ > 0. For s, ¢ 
and uv with O<s<t<_u, let us call ¢ the ‘present’ time. Then, with 
respect to the ‘present’ time ¢, the o-field Hs = 0{Xv,¥ < S} ig the 
‘past’ (the ‘history’) of the process X up to time s, while the o-field 
Fy, = o{Xy,v = u} is called the ‘future’ of X from time u on. 

Denote by 74s V Fu, s < u, the minimal o-field generated by the 
union of 7¢s and Fu. The Markov property of X, usually written as 
PiX, © PIHs| = P[At © V'\Xs] as. can be expressed in the 
following equivalent form involving the ‘past’ and the ‘future’ in a 
symmetric way (see e.g. Al-Hussaini and Elliott 1989): 


(1) E|X,|7t, V Fy,| = ELX¢|X.e, Xy] as. 
It 1s not difficult to derive from (1) the corollary: 
(2) E|X¢|Hs Vv a(Xu)| = E/X%|Xs, Xu| ds. 


Our goal now is to determine if the ‘information’ 7ts v o(X,,) 


can be enlarged whilst still keeping the Markov property (2). Let 
us consider a standard Wiener process w = (wy, t = 0) and let 1 <5 


<t<2. By Hs = O{W,,Vv <s} we denote the ‘past’ of w about the 
‘present’ time ¢ and o{w ] + w3} 1s the o-field generated by the r.v. 
w 1 + wy. Note that the value w9 plays the role of the fixed ‘future’ 
of the process w; at time ¢ = 2. In such a case we speak about a 
Brownian bridge process. 

We want to compare two conditional expectations, 
E(wilH? V o(wi + we) and E[w; |w., wy +w2]. In view of (2) we 
could suggest that these two quantities coincide. Let us check 1f 
this is true. The Markov property of w implies 


Elw.lH? V o(w, + we)] = Elw.|H? V o(we)| 


E|w,|ws, wel as. 


Since w 1s a Gaussian process with independent increments, we 
can easily derive the following two relations: 
E/w;,|w,, we] = |(2 —t)we, + (t — s)we]/(2 — s) as. 


and 


E/w,|we, w, + we] = [(1 + 8)(1 + t)w, + s(1 + t)(w, + we)]/[(1 + s)? +53] as. 


Thus we have shown that 


E[w,JHY V o(w, + we)) 4 Eluy|w,, wy + wel. 


Hence the Markov property expressed by (2) will not be 
preserved if the ‘past’ H? is enlarged by “new information’ taken 
strictly from the ‘future’. 


SECTION 25. DIVERSE PROPERTIES OF STOCHASTIC 
PROCESSES 


This section covers only a few counterexamples concerning 
different properties of stochastic processes. All new notions are 
defined in the examples themselves. Obviously, far more 
counterexamples could be considered here, but for various reasons 
we have restricted ourselves to indicating additional sources of 
interesting but rather complicated counterexamples in_ the 
Supplementary Remarks. 


25.1. How can we find the probabilistic characteristics of a 
function of a stationary Gaussian process? 


Let X¥ = (X%}, t € ri) be a real-valued stationary Gaussian process 


with EX; = 0, t¢€ rl, and covariance function C(t) = E[X,X,+/], s,¢ 


< r!. Then the finite-dimensional distributions of X, and hence any 
other probabilistic characteristics, are completely determined by 


C(t), t € r!. In other words, if X is any Gaussian process and we 
know its moments of order | and 2 (that is, we know the mean 
function EX; and the covariance function ELX,X;]), then we can 


find all probabilistic characteristics of X. It is interesting to clarify 
whether a similar fact holds for the process Y = (¥Y;, t € Ri) which 
is a function of X. We consider the following particular case 


=i, beak: 


Does there exist a universal constant m such that the moments of Y 
of order not greater than m are enough to determine the 
distributions of Y? As was mentioned above, for Gaussian 
processes the answer is positive and m = 2. 

For fixed ¢ € rR! with le] < 1 and integer n => 2, introduce the 
function 


f(A) = A7(1 — cosA)(1 + ecosnaA), AER’. 


It can be shown that the Fourier transform C, , of f has the form: 


Se(1 —jt—n]), if |t-—n| <1] 
: 1 — |t], if |t}<1 
1 ae d — 7 
ay m=) Gel leokal. Ge le nl Sa 
9 
0. otherwise. 


Moreover, for |e| < 1 and n > 2 the function C,,(0),t € R! . is 


positive definite. Therefore (see Doob 1953; Ash and Gardner 
1975) there is a centred stationary Gaussian process, say X, with 
covariance function equal to Cg py. 


2 
Now take %* = 7, t € Ri, and suppose that we know all the 
moments of Y of order not greater than m where m is a fixed 


natural number. This means that we know the quantities E[ YY; - - 


* Y,x| for all A < m and arbitrary ¢) ,..., th € r!. However, 
(2) E| ti te a = E[X;, Xz, XZ | 


72 yr2 2 
and since “é11t2>+++> At is a product of 2k Gaussian r.v.s, this 
includes the repetitions, applying the well known Wick lemma we 


obtain 


E 


- Yt ares Y;,.| - »., Cental = beh) Cami (bn2 a tx1) ie Cex ek a bar(k— L) ) 


where this summation is taken over the group of all permutations z 
of the & elements 1 ,¢9,°°,t. 


Now we show (by an appropriate choice of k) that the 
information contained in (2) 1s not sufficient to determine the sign 
of the parameter e. Indeed, let us first clarify which of the terms in 
(2) give a non-zero contribution. It 1s easy to see that non-zero 
terms are those in which the difference |t;; — tz; — 1)| 1s either 


smaller than 1 or is between & — | and & + 1. This observation 1s 
based on the explicit form (1) of the covariance function C;,,,; 
together with the equality (¢71 — tzx) + °° + + (tek — tr(k-1)) = 0" 1t 
implies that if we choose & such that n > 2m > 2k then the number 
of terms in (2) whose arguments are close to n is the same as the 
number of terms with arguments close to (—n). Obviously this 
means that the parameter ¢ 1n (2) has an even power and thus its 
sign 1s lost. 

We have shown that for an arbitrary positive integer m, there 
exists a centred stationary Gaussian process X such that the 


, 2 
moments of order not greater than m of eS Ay , te ri, are not 
sufficient to determine the distributions of Y. 


25.2. Cramer representation, multiplicity and spectral type of 


stochastic processes 


Let X = (X(0),t = 0) be a real-valued L2-stochastic process defined 
on a given probability space (Q, J, P). Denote by Hi(X) the 


closed (in L?-sense) linear manifold generated by the r.v.s {X(s), 0 
<s <t!} and let 


H(X) = |) He(X 


teRr 


Suppose now that Y = (¥(t),t = 0) is an L2-process with 
orthogonal increments. Then HY ) coincides with the set of all 
integrals Jo 9(u) AY (u) where Jo 9°(u) dF(u) < and dF(u) = 

E[dY(u)]. Thus we come to the following interesting and 


important problem. For a given process_X, find a process Y with 
orthogonal increments such that 


(1) Hel 4) =a) foreach: > 0. 


Regarding this problem and other related topics we refer the reader 
to works by Cramer (1960, 1964), Hida (1960), Ivkovic et al 
(1974) and Rozanov (1977). In particular, Hida (1960) suggested 
the first example of a process X for which relation (1) is not 
possible. Take two independent standard Wiener processes, w] = 
(w1(t), t => 0) and w9 = (w2(t), t = 0), and define the process X = 
(X(t), ¢ > 0) by 


X(t) = | wi(t), if tis rational 
’ | wo(t), if ¢ is irrational. 


Obviously, X is discontinuous at each ¢ and we have the 
representation 


Hy (X) = He(wi) BHz(we), t>0 


where as usual the symbol @ denotes the sum of orthogonal 
subspaces. 

The general solution of the coincidence problem (1) was found 
by Cramer (1964) and can be described as follows. Let F'}, F’,..., 
F'n be an arbitrary sequence of measures on R™ ordered by absolute 
continuity, namely: 


(2) PB, > Fo > +++ > Py. 


Here JN is either a fixed natural number or infinity. Then there 
exists a continuous process X and N mutually orthogonal processes 
Yy,.-. YN each with orthogonal increments and dF,(t) = 


E[(dY,,(0)*], 7 = 1,..., N, such that 


IN 


(3) HOS > Oe), Bed 


n=1 


This general result implies 1n particular that 


(4) AS s- / | Gn{t, u) dY,,(u) 


n=] 


where the functions g,, n = 1,..., N, satisfy the condition 


N 2 
m1" 


The equality (4) is called the Cramer representation for the 
process X while the sequence (2) 1s called the spectral type of X. 
Finally, the number AN in (2) (also in (3) and (4)) 1s called the 
multiplicity of X. 

Our purpose now is to illustrate the relationships between the 
notions introduced above by suitable examples. 


(i) Suppose Y = (Y(t), ¢ = O) is an arbitrary L2-process with 
orthogonal increments and E[(d¥(#)7] = dt. Consider the process _X 
= (X(0), t= 9), 


(5) X(t) = | h(t,u) d¥(u) 


where / is some (deterministic) function. Comparing (5) and (4) 
we see that _X has a Cramér-type representation and it 1s natural to 
expect that the multiplicity of XY is equal to 1. However, the kernel 
h can be chosen such that the multiplicity of XY 1s greater than 1. 
Indeed, take 


A(t,u)=0, ifO<t<t and a<u<b<to 
where 0 < a < b < fg are fixed numbers. Since for ¢ > 0 any 
increment Y (d) — Y (c) witha <c<d< b belongs to H(Y) and is 
orthogonal to J&(X), we conclude that F(X) C He(Y) (the 
inclusion is strong). Further, the function A can be chosen 
arbitrarily for 0 <t< tg and u € [a, b]. Take, for example, 


LG i= sinu, if wis rational 

ses) cosu, if u is irrational 
and suppose that 5 — a = 2zk for some natural number «. Then for 
any t > to, X(t) is equal either to Z, or to Z) where Z] and Z> are 
r.v.s defined by 


‘b “b 
Z\ =| sinudY(u), 4 =| cos u dY (tw). 


a. 


Obviously, 


nak 
E| 7, Z2| = / sinucosudu = 0 
0 


which means that Z; and Z> are orthogonal. Moreover, both Z; and 
Z> are orthogonal to the space Jt (X). Thus for any t > to, (X) 
consists of Flt (X ), Z, and Z7. Consequently 


() Heo45(X) = Hy (X) @ Z1 @ Zp. 
5>0 
According to a result by Cramer (1960), the point fg is a point of 


increase of the space Hx (X) with dimension equal to 2. Therefore 
the multiplicity of the process X at time tg cannot be less than 2. 


(ii) Let X) = (X4(0), t = 0) and Xo = (X9(t), t = 0) be Gaussian 
processes. Denote by P; and P> the probability measures induced 


by these processes in the sample function space. It is well known 
that P; and P> are either equivalent or singular (see [bragimov and 


Rozanov 1978). The question of whether P; and P» are equivalent 


is closely related to some property of the corresponding spectral 
types of the processes Xj] and X5. The following result can be 


found in the book by Rozanov (1977). If the measures P; and P> 
of the Gaussian processes X1] and X> are equivalent, then X] and 
X> have the same spectral type. The next example shows whether 


the converse is true. 
Consider two processes, the standard Wiener process w = (w(t), ¢ 
> 0) and the process ¢ = (€(t), t= 0) defined by 


Ei) = ACE amt), Teed. 


Here / is a function which will be chosen in a special way: / 1s a 
non-random continuous function which is_ not absolutely 
continuous in any interval. Additionally, let 0 < cy < h(t) <co9 < 
for some constants cj, cz and all ¢. It is obvious that 
Hi(€) = He(w) for each t > 0. This implies that the processes ¢ 
and w have the same spectral type. 


Denote by Py, and Pz the measures in the space c(R*) induced 


by the processes w and ¢ respectively. Clearly, it remains for us to 
see whether these measures are equivalent. Indeed, if Cy, and C¢ 


are the covariance functions of w and ¢, then the difference 
between them is 


A(s,t) = Cy(s,¢) — Ce(s,t) = [1 — h(s)h(t)| min{s, ¢}. 


Since the function A(s,f), (s, ft) € Re, is not absolutely continuous, 
using a well known criterion (see Ibragimov and Rozanov 1978) 
we conclude that the measures Py, and P¢ are not equivalent 


despite the coincidence of the spectral types of the processes w and 


C. 


25.3. Weak and strong solutions of stochastic differential 
equations 


A large class of stochastic processes can be obtained as solutions 
of stochastic differential equations of the type 


t 


t 
(1) X(t) = Xo +f a(s, X(s)) ds +| a(s, X(s))dw(s), t >0 


0 0 


where a and o, the drift and the diffusion coefficients 


tf vf 
respectively, satisfy appropriate conditions, and Jo 7(-) dw(s) is a 


stochastic integral (in the sense of K. Ito) with respect to the 
standard Wiener process. 

Let us define two kinds of solutions of (1), weak and strong, and 
then analyse the relationship between them. 

Let w = (w(t), t = 0) be a standard Wiener process on the 
probability space (Q, F, P). Suppose that w is adapted to the 
family (Ft, t > 0) of non-decreasing sub-o-fields of ¥. If there 
exists an (Jt)-adapted process X = (X(t), t > 0) satisfying (1) a.s., 


we say that (1) has a strong solution. If (1) has at most one (4¢)- 
strong solution, we say that strong uniqueness (pathwise 
uniqueness) holds for (1), or that (1) has a unique strong solution. 
Further, if there exist a probability space (Q, ¥, P), a family (7 t 
> 0) of non-decreasing sub-o-fields of F’, and two (+ +)-adapted 
processes, X' = (X(t), t= 0) and w’ = (w“(#), t = 0) such that w’ is a 
standard Wiener process and (X', w’) satisfy (1) a.s., we say that a 
weak solution exists. If the law of the process_X' 1s unique (that 1s, 
the measure in the space C generated by_X" 1s unique), we say that 
weak uniqueness holds for (1), or that (1) has a unique weak 
solution. 

There are many books dealing entirely or partially with the 
theory of stochastic differential equations (see Doob 1953; 
McKean 1969; Gihman and Skorohod 1972, 1979; Liptser and 
Shiryaev 1977/78; Krylov 1980; Jacod 1979; Kallianpur 1980; 
Ikeda and Watanabe 1981; Durrett 1984). 

The purpose of the two examples below is to clarify the 
relationship between the weak and strong solutions of (1), looking 
at both aspects, existence and uniqueness. The survey paper by 
Zvonkin and Krylov (1981) provides a very useful and detailed 
analysis of these two concepts (see also Barlow 1982; Barlow and 
Perkins 1984; Protter 1990; Karatzas and Shreve 1991). 

Let us briefly describe the first interesting example 1n this field. 
Consider the stochastic differential equation 


t 
et) = / lr(s)|*dw(s), t>0 
JO 


1 

where a > 0 1s a fixed parameter. It can be shown that for a = 2 this 

equation has only one strong solution (with respect to the family ( 
v 1 

Fy )), namely x = 0. However, for 0 < a < 2 it has infinitely many 

solutions. For the proof of this result we refer the reader to the 

original paper of Girsanov (1962) (see also McKean 1969). Thus 


the above stochastic equation has a strong solution for any a > 0, 
but this strong solution need not be unique. 

Among a variety of results concerning the properties of the 
solutions of stochastic differential equations (1), we quote the 
following (see Yamada and Watanabe 1971): strong uniqueness of 
the solution of (1) implies its weak uniqueness. 

Of course, this result is not surprising. However, it can happen 
that a weak solution exists and is unique, but no strong solution 
exists. For details of such an example we refer the reader to a book 
by Stroock and Varadhan (1979). 

Let us now consider an example of a stochastic differential 
equation which has a unique weak solution but several (at least 
two) strong solutions. 

Take the function o(x) = 1 if x = 0 and o(x) =— 1 if x < 1 and 
consider the stochastic equation 


t 
(2) 4 ee / o(els)idin(s), t= 0. 
J0 


Firstly let us check that (2) has a solution. Suppose for 
simplicity that xg = 0 and let 


wmiyp=—ai(t) and. awit) = | o(2(s)) dw(s), t>0. 


Then w is a continuous local martingale with (W)t =t and so w isa 
Wiener process. Moreover, the pair (x(t), w(t), t = 0) 1s a solution 
of (2). Hence the stochastic equation (2) has a weak solution. 
Weak uniqueness of (2) follows from the fact that for any solution 


f ‘ 
x, the stochastic integral Jo 7(2(s)) dw(s) with the function o 
defined above, is again a Wiener process. 
It remains for us to show that strong uniqueness does not hold 
for the stochastic equation (2). Obviously, o(—x) = —o(x) # 0 for x # 
0 and if x9 = 0 and (x(t), t > 0) is a solution of (2), then the process 


(—x(t), t= 0) 1s also its solution. 

Moreover, it is not only strong uniqueness which cannot hold for 
equation (2)—the stochastic equation (2) does not have a strong 
solution at all. This can be shown by using the local time technique 
(for details see Karatzas and Shreve 1991). 


25.4. A stochastic differential equation which does not have a 
strong solution but for which a weak solution exists and 
is unique 

Let (Q, F, P) be a complete probability space on which a standard 

Wiener process w = (w(t), tf = 0) 1s given. Suppose that a(¢,x) 1s a 

real-valued function on [0,1] x c([{0,1]) defined as follows. Let (t, 

A = 0, —1, —2.,...) be a sequence contained in the interval [0,1] and 

such thattg =1>tj>tyo>..—7O0ask — o. If for x € c([0,1]) 

we have a(0, x) = 0 and if t > 0, let 


woe, CEL - . . . 
alte) —= | aes | for t;, = t a a aos —], —?2.... 


where {a} denotes the fractional part of the real number a and x; 

denotes the value of the continuous function x at the point ¢. 

Clearly, a satisfies the usual measurability conditions, a is (©:)- 
th oie. xs 

adapted where © = o{x,, s < ft} and Jo a(t, 0) dt < 00 for each x € 


c({0,1)). 


Consider the following stochastic differential equation: 


t 
cL) a= | a(s,€)ds+u;,, t € [0,1]. 
JO 


Firstly, according to general results given by Liptser and 
Shiryaev (1977/78), Stroock and Varadhan (1979) and Kallianpur 
(1980), equation (1) has a weak solution and this solution is 
unique. 


Let us now determine whether equation (1) has a strong 
solution. Suppose the answer 1s positive, that is (1) has a strong 
solution (¢;, 0 < t < 1) which is (Fe )-adapted where Fy = =alWsy S< 
t}. Then if th <t< t+) we obtain from (1) that 


Ey =o = [ {ai Stn : ds + uw, — Wey. 
J ti bk i, be—1 


Using the notations 
a Che “hips 5 Wtpe44 — Wet, 
it = EEL 


we arrive at the relation 


Neei = {ne} terei, k=0,-1,-2,.... 


Since we supposed that a strong solution of (1) exists, 7; must be 
LU 


Fe, -measurable and, moreover, the family of r.v.s {47,,, m=k, k — 
1,...} 18 independent of ¢741. This independence and the equality 


hi pees Geseiiceth, siemores... 
(2) ef*TMk+1 — po THANKS iE 41 


easily lead to the relation 


— due 77 /(tr+1—te) 


dy41 = d,Ele 


where we have introduced the notation d; = E[e27741, Thus, for 
any n = 0,1,..., we get inductively that 


, i | 
dpiy = dp_yn exp 20 ({—— foeee ———)| . 
. : bey - Ub ae, ee —_ on 


It follows that |dz+1| < e 2m "| for any n and so dj+1 — 0 asn > 
oo. Hence 


d, =0 for k=0,-—1,-2,.... 
From (2) and the relation for 7741 we find that 


ITIL ring an ~2ilepay te tena _e,) 
e“Tk+1 — @ Ik—n g*TUER41 k+1—n] 


and also 
LUT k mw = Oi edt ee ie tes lat ee 
i P 3 
: — Iisa, ie bs Se EE = Fas . 
where The te+i] O{We — Wsytk-n SS SES ter fh Since dp_y 
= 0 we have 
ZTE 4-4 = 
Ele Fite atta! = 0. 
yw ip Fw gu 


Now, if 1 — ©, then ~ lfn—%:te+1) te+1 and since 4441 is ~ te+1- 


measurable for each k, we come to the equality 


Q — E [e*7 nk +1 


F | —e the+1 : 
Ce+1 


It is obvious, however, that this is not possible and _ this 
contradiction 1s a direct result of our assumption that (1) has a 
strong solution. Therefore, despite the fact that the stochastic 
differential equation (1) has a unique weak solution, it has no 
strong solution. 

In Examples 25.3 and 25.4 we analysed a few stochastic 
differential equations and have seen that the properties of their 
solutions (existence, non-existence, uniqueness, non-uniqueness) 
in the weak and strong sense depend completely on either the drift 
coefficient or the diffusion coefficient. 

More details on stochastic differential equations, not only theory 
but also examples and intricate counterexamples, can be found in 
many books (e.g. Liptser and Shiryaev 1977/78 and 1989; Jacod 
1979; Strook and Varadhan 1979; Kallianpur 1980; Ikeda and 
Watanabe 1981; Jacod and Shiryaev 1987; Protter 1990, Karatzas 
and Shreve 1991; Revuz and Yor 1991; Rogers and Williams 


1994; Nualart 1995). 


Supplementary Remarks 


Section 1. Classes of random events 


Examples 1.1, 1.2, 1.3, 1.4 and 1.7 or their modifications can be 
found in many books. These examples are part of so-called 
probabilistic folklore. The idea of Example 1.5 is taken from Bauer 
(1996). Example 1.6 is based on arguments by Neveu (1965) and 
Kingman and Taylor (1966). Other interesting counterexamples 
and ideas for constructing counterexamples can be found in works 
by Chung (1974), Broughton and Huff (1977), Williams (1991) 
and Billingsley (1995). 


Section 2. Probabilities 


Example 2.1 could be classified as folklore. Example 2.2 belongs 
to Breiman (1968). The presentation of Example 2.3 follows that in 
Neveu (1965) and Shiryaev (1995). Example 2.4 was originally 
suggested by Doob (1953) and has since been included in many 
books; see Halmos (1974), Loeve (1978), Laha and Rohatgi 
(1979), Rao (1979) and Billingsley (1995). We refer the reader 
also to works by Pfanzagl (1969), Blake (1973), Rogers and 
Williams (1994) and Billingsley (1995) where other interesting 
counterexamples concerning conditional probabilities can be 
found. 


Section 3. Independence of random events 


Since the concept of independence plays a central role in 
probability theory, it is no wonder that we find it treated in almost 
all textbooks and lecture notes. Many examples concerning the 
independence properties of collections of random events could be 
qualified as probabilistic folklore. For Example 3.1 see Feller 
(1968) or Bissinger (1980). Example 3.2), suggested by 


Bohlmann (1908), and 3.2(11), suggested by Bernstein (1928), seem 
to be the oldest among all examples included into this book. 
Example 3.2(i) 1s due to Feller (1968) and 3.2(v) to Roussas 
(1973). Examples 3.2(4v) and 3.3(11) were suggested by an 
anonymous referee. Example 3.3(i1) 1s given by Ash (1970) and 
Shiryaev (1995). The idea of Examples 3.3(11) and 3.7 was taken 
from Neuts (1973). Example 3.44) belongs to Wong (1972) and 
case (11) of the same example was suggested by Ambartzumian 
(1982). Example 3.5 is based on the papers of Wang et al (1993) 
and Stoyanov (1995). Example 3.6 is considered by Papoulis 
(1965). Example 3.7 is given by Sevastyanov ef al (1985). For 
other counterexamples the reader is referred to works by Lancaster 
(1965), Kingman and Taylor (1966), Crow (1967), Moran (1968), 
Ramachandran (1975), Chow and Teicher (1978), Grimmett and 
Stirzaker (1982), Lopez and Moser (1980), Falk and Bar-Hillel 
(1983), Krewski and Bickis (1984), Wang et al (1993), Stoyanov 
(1995), Shiryaev (1995), Billingsley (1995) and Mori and 
Stoyanov (1995/1996). 


Section 4. Diverse properties of random events and their 
probabilities 


The idea of Example 4.1 came from Gelbaum (1976) and, as the 
author noted, case (11) was originally suggested by E. O. Thorp. 
Example 4.2 is folklore. Example 4.3 belongs to Krewski and 
Bickis (1984). Example 4.5 1s from Reényi (1970). Several other 
counterexamples can be found in works by Lehmann (1966), 
Hawkes (1973), Ramachandran (1974), Lee (1985) and Billingsley 
(1995). 


Section 5. Distribution functions of random variables 


Different versions of Examples 5.1, 5.3, 5.6 and 5.7 can be found 
in many sources and definitely belong to folklore. Example 5.2 


was suggested by Zubkov (1986). Examples like 5.5 are noted by 
Gnedenko (1962), Crameér (1970) and Laha and Rohatgi (1979). 
Case (11) of Example 5.8 is described by Ash (1970) and case (111) 
is given by Olkin et al (1980). Cases (iv) and (v) of the same 
example are considered by Gumbel (1958) and Fréchet (1951). A 
paper by Clarke (1975) covers Example 5.9. Example 5.10(1) 1s 
treated by Chung (1953), while case (11) is presented by 
Dharmadhikari and Jogdeo (1974). Example 5.12 follows the 
presentation in Dharmadhikari and Joag-Dev (1988). The last 
example, 5.13, is described by Hengartner and Theodorescu 
(1973). Other counterexamples concerning properties of one- 
dimensional and multi-dimensional d.f.s can be found in the works 
of Thomasian (1969), Feller (1971), Dall’ Agho (1960, 1972, 
1990), Barndorff-Nielsen (1978), Ruschendorf (1991), Rachev 
(1991), Mikusinski et al (1992) and Kalashnikov (1994). 


Section 6. Expectations and conditional expectations 


Example 6.1 belongs to Simons (1977). Example 6.2 is due to 
Takacs (1985) and is the answer to a problem proposed by 
Emmanuele and Villani (1984). Example 6.4 and other related 
topics can be found in Piegorsch and Casella (1985). Example 6.5, 
suggested by Churchill (1946), 1s probably the first example to be 
found of a non-symmetric distribution with vanishing odd-order 
moments. Example 6.6 is indicated by Bauer (1996). Examples 6.7 
and 6.8 can be classified as folklore. Example 6.9 belongs to Enis 
(1973) (see also Rao (1993)) while Example 6.10 was taken from 
Laha and Rohatgi (1979). The idea of Example 6.11 1s taken from 
Dharmadhikar1 and Joag-Dev (1988). Finally, Example 6.12 
belongs to Tomkins (1975a). Several other counterexamples 
concerning the integrability properties of r.v.s, conditional 
expectations and some related topics can be found in works by 
Robertson (1968), B. Johnson (1974), Witsenhausen (1975), Rao 
(1979), Leon and Massé (1992), Bryc and Smolensk! (1992), Zieba 


(1993) and Rao (1993). 


Section 7. Independence of random variables 


Examples 7.1(1), 7.8, 7.9(1) and (11), 7.10(4), (41) and (411), and 7.12 
can be described as folklore. Examples 7.1(11), (111), and 7.8 follow 
some ideas by Feller (1968, 1971). Example 7.2 is due to Pitman 
and Williams (1967), who assert that this is the first example of 
three pairwise independent but not mutually independent r.v.s in 
the absolutely continuous case. Example 7.3(1) 1s based on a paper 
by Wang (1979), case (11) is considered by Han (1971), while case 
(111) is outlined by Ying (1988). Example 7.4 is based on a paper 
by Wang (1990). Runnenburg (1984) is the originator of Example 
7.5. Drossos (1984) suggested Example 7.6(1) to me and attributed 
it to E. Lukacs and R. Laha. Case (11) of the same example was 
suggested by Falin (1985) and case (111) is indicated by Rohatgi 
(1976). The description of Example 7.7(1) follows an idea of Fisz 
(1963) and Laha and Rohatgi (1979). Examples 7.7(11) and 7.12 are 
indicated by Reényi (1970). The idea of Example 7.7(h) belongs to 
Flury (1986). Case (111) of Example 7.10 was suggested by an 
anonymous referee. Example 7.11 (i) is taken from Ash and 
Gardner (1975). Examples 7.14(1) and (11) belong to Chow and 
Teicher (1978) while case (111) of the same example is considered 
by Cinlar (1975). Case (1) of Example 7.15 follows an idea of 
Billingsley (1995) and case (11) 1s indicated by Johnson and Kotz 
(1977). Finally, Example 7.16 is based on a paper by Kimeldorf 
and Sampson (1978). Note that a great number of additional 
counterexamples concerning the independence and dependence 
properties of r.v.s can be found in works by Geisser and Mantel 
(1962), Tsokos (1972), Roussas (1973), Coleman (1974), Chung 
(1974), Joffe (1974), Fortet (1977), Ganssler and Stute (1977), 
Loeve (1978), Wang (1979), O’Brien (1980), Grimmett and 
Stirzaker (1982), Galambos (1984), Gelbaum (1985, 1990), 
Heilmann and Schroter (1987), Ahmed (1990), Dall’ Aglio (1990), 


Dall’ Aglio et al (1991), Durrett (1991), Whittaker (1991), Liu and 
Diaconis (1993) and Mori (1995). 


Section 8. Characteristic and generating functions 


Example 8.1 belongs to Gnedenko (1937) and can be classified as 
one of the most significant classical counterexamples in probability 
theory. Example 8.2 is contained in many books; see those by Fisz 
(1963), Moran (1968) and Ash (1972). Examples 8.3, 8.4, 8.5 and 
8.6, or versions of them, can be found in the book by Lukacs 
(1970) and in later books by other authors. Example 8.7 was 
suggested by Zygmund (1947) and our presentation follows that in 
Rényi (1970) and Lamperti (1966). Example 8.8 is described by 
Wintner (1947) and also by Sevastyanov et al (1985). Example 

8.9 is given by Linnik and Ostrovskii (1977). Finally, Example 
8.10 is presented in a form close to that given by Lukacs (1970) 
and Laha and Rohatgi (1979). Note that other counterexamples on 
the topics in this section can be found in works by Ramachandran 
(1967), Thomasian (1969), Feller (1971), Loéve (1977/1978), 
Chow and Teicher (1978), Rao (1984), Rohatgi (1984), Dudley 
(1989) and Shiryaev (1995). 


Section 9. Infinitely divisible and stable distributions 


Example 9.1 and other versions of it can be classified as folklore. 
Example 9.2 belongs to Gnedenko and Kolmogorov (1954) (see 
also Laha and Rohatgi (1979)). Example 9.3(1) 1s based on a paper 
by Shanbhag et a/ (1977) and answers a question proposed by 
Steutel (1973). Case (11) of Example 9.3 as well as Example 9.4 are 
considered by Rohatgi et al (1990). Example 9.5 is described by 
Linnik and Ostrovski (1977). Example 9.6 belongs to Levy 
(1948), but some arguments from Griffiths (1970) are also needed 
(also see Rao (1984)). Ibragimov (1972) proposed Example 9.7. 
Example 9.8 could also be considered as probabilistic folklore. The 


last example, 9.9, belongs to Lukacs (1970). Let us note that 
several other counterexamples which are interesting but rather 
complicated can be found in works by Ramachandran (1967), 
Steutel (1970), Kanter (1975), O’Connor (1979), Marcus (1983), 
Hansen (1988), Evans (1991), Jurek and Mason (1993), Rutkowski 
(1995) and Bondesson et al (1996). 


Section 10. Normal distribution 


Some of the examples in this section are popular among 
probabilists and statisticians and can be found in different sources. 
In particular, cases (11), (111) and (iv) of Example 10.1 are noted 
respectively by Roussas (1973), Morgenstern (1956) and Olkin et 
al (1980). The idea of Example 10.2 1s indicated by Papoulis 
(1965). Example 10.3(1) is based on papers by Pierce and Dykstra 
(1969) and Han (1971). Case (41) of the same example 1s 
considered by Biihler and Mieshke (1981). Example 10.4(4) in this 
form belongs to Ash and Gardner (1975) and case (111) is treated by 
Ijzeren (1972). Hamedani and Tata (1975) describe Examples 10.5 
and 10.7, while Example 10.6 is considered by Hamedani (1984). 
Moran (1968) proposed the 

problem of finding a non-normal density such that both conditional 
densities are normal. Example 10.8 presents one of the possible 
answers. Case (1) 1s a result of my discussions with N. V. Krylov 
and A. M. Zubkov, while case (11) is due to Ahsanullah and Sinha 
(1986). Example 10.9 1s given by Breiman (1969). Finally, 
Example 10.10 was suggested by Kovatchev (1996). Many useful 
facts, including counterexamples, concerning the normal 
distribution can be found in the works of Anderson (1958), Steck 
(1959), Geisser and Mantel (1962), Grenander (1963), Thomasian 
(1969), Feller (1971), Kowalski (1973), Vitale (1978), Hahn and 
Klass (1981), Melnick and Tenenbein (1982), Ahsanullah (1985), 
Devroye (1986), Janson (1988), Castillo and Galambos (1989), 
Whittaker (1991) and Hamedani (1992). 


Section 11. The moment problem 


Example 11.1 follows the presentation of Berg (1988). Example 
11.2(4) was originally suggested by Heyde (1963a) and has since 
been included in many textbooks; see Feller (1971), Rao (1973), 
Billingsley (1995), Laha and Rohatgi (1979). Case (11) of this 
example belongs to Leipnik (1981). Example 11.3 is considered in 
a recent paper by Targhetta (1990). Example 11.4 is mentioned by 
Hoffmann-Jorgensen (1994), but also see Lukacz (1970) and Berg 
(1988). Example 11.5 follows an idea from Carnal and Dozzi 
(1989). Our presentation of Example 11.6 follows that in Kendall 
and Stuart (1958) and Shiryaev (1995). Examples 11.7 and 11.8 
belong to Jagers (1983) and Fu (1984) respectively. As far as we 
know these are the first examples of this kind in the discrete case 
(also see Schoenberg (1983)). Example 11.9(1) is based on a paper 
by Dharmadhikari (1965). Case (11) of the same example 1s 
considered by Chow and Teicher (1978). Both cases of Example 
11.10 belong to Heyde (1963b). Example 11.12 1s based on papers 
by Heyde (1963b) and Hall (1981). Example 11.13 is treated by 
Heyde (1975). Note that other counterexamples concerning the 
moment problem as well as related topics can be found in works by 
Fisz (1963), Neuts (1973), Prohorov and Rozanov (1969), Lukacs 
(1970), Schoenberg (1983), Devroye (1986), Berg and Thill 
(1991), Slud (1993), Hoffmann-Jorgensen (1994) and Shiryaev 
(1995). Readers interested in the history of progress in the moment 
problem are referred to works by Shohat and Tamarkin (1943), 
Kendall and Stuart (1958), Heyde (1963b), Akhiezer (1965) and 
Berg (1995). 


Section 12. Characterization properties of some 
probability distributions 


Example 12.1 was suggested by Zubkov (1986). General 
characterization theorems for the binomial distribution can be 


found in Ramachandran (1967) and Chow and Teicher (1978). 
Example 12.2 is given by Klimov and Kuzmin (1985). Example 
12.3 belongs to Steutel (1984) but, according to Jacod (1975), a 
similar result was proved by R. Serfling and included in a preprint 
which unfortunately I have never seen. Example 12.4 1s a natural 
continuation of the reasoning in Example 12.1. Example 12.5 
belongs to Philippou and Hadjichristos (1985). Example 12.6 1s 
given by Rossberg et al (1985). Example 12.7 is based on an idea 
of Robertson et a/ (1988). Laha (1958) is the author of Example 
12.8(1), while case (1v) of this example uses an idea from Mauldon 
(1956). Case (v) of Example 12.8 is discussed by Letac (1995). 
Baringhaus and Henze (1989) invented Example 12.9. Example 
12.10 1s based on a paper by Blank (1981). The idea of Example 
12.11 can be found in the book by Syski (1991). Example 12.12 is 
outlined by Rohatgi (1976) and Example 12.14 was suggested to 
me by Seshadri (1986). Note that other counterexamples and useful 
facts concerning the characterization-type properties of different 
classes of probability distributions can be found in works by 
Mauldon (1956), Dykstra and Hewett (1972), Kagan et al (1973), 
Gani and Shanhag (1974), Huang (1975), Galambos and Kotz 
(1978), Ahlo (1982), Azlarov and Volodin (1983), Hwang and Lin 
(1984), Rossberg et al (1985), Too and Lin (1989), 
Balasubrahmanyan and Lau (1991), Letac (1991, 1995), Prakasa 
Rao (1992), Yellott and Iverson (1992), Braverman (1993) and 
Huang and Li (1993). 


Section 13. Diverse properties of random variables 


Example 13.1 (1) 1s folklore while case (11) is due to Behboodian 
(1989). Example 13.2 is indicated by Feller (1971), but we have 
followed the presentation given by Kelker (1971). Example 13.3 1s 
outlined by Barlow and Proshan (1966). Example 13.4 is based on 
a paper by Pavlov (1978). The notion of exchangeability 1s 
intensively treated by Feller (1971), Chow and Teicher (1978), 


Laha and Rohatgi (1979) and Aldous (1985). Example 13.5 1s 
based on these sources and on discussions with Rohatgi (1986). 
Diaconis and Dubins (1980) suggested Example 13.6, but a similar 
statement can also be found in the book by Feller (1971). The idea 
of Example 13.7 is indicated by Galambos (1987). Example 13.8 
belongs to Taylor et al (1985). Example 13.9 1s considered by Gut 
and Janson (1986). Other counterexamples classified as ‘diverse’ 
can be found in the works of Bhattacharjee and Sengupta (1966), 
Ord (1968), Fisher and Walkup (1969), Brown (1972), Burdick 
(1972), Dykstra and Hewett (1972), Klass (1973), Gleser (1975), 
Cambanis et al (1976), Freedman (1980), Tong (1980), Franken 
and Lisek (1982), Laue (1983), Chen and Shepp (1983), Galambos 
(1984), Aldous (1985), Taylor et al (1985), Husler (1989) and 
Metry and Sampson (1993). For more abstract topics, see Laha and 
Rohatgi (1979), Rao (1979), Tyur (1980), Vahantya et al (1989), 
Gelbaum and Olmsted (1990), Dall’Aglio et al (1991), Ledoux and 
Talagrand (1991), Kalashnikov (1994) and Rao (1995). 


Section 14. Various kinds of convergence of sequences 
of random variables 


Examples 14.1, 14.2, 14.4, 14.5, 14.6, 14.8@q), 14.10(1), 14.12() or 
their modifications can be found in many publications. These 
examples can be classified as belonging to probabilistic folklore. 
Examples 14.3(1) and 14.7(1) are based on arguments by Roussas 
(1973), Laha and Rohatgi (1979) and Bauer (1996). Example 
14.3(11) 1s due to Grimmett and Stirzaker (1982). Examples 14.7(11) 
and 14.8(11) are considered by Thomas (1971). Fortet (1977) has 
described Example 14.7(111). Example 14.7(iv) 1s treated by Chung 
(1974). The idea of Example 14.9 is indicated by Feller (1971), 
Lukacs (1975) and Billingsley (1995). Cases (1) and (11) of example 
14.10 were suggested by Grimmett (1986) and Zubkov (1986) 
respectively. Example 14.11 1s due to Rohatgi (1986) and a similar 
example is given in Serfling (1980). Case (11) of Example 14.12 1s 


briefly discussed by Cuevas (1987). Example 14.13 1s presented in 
a form which is close to that of Ash and Gardner (1975). In 
Example 14.14 we follow Hsu and Robbins (1947) and Chow and 
Teicher (1978). Lukacs (1975) considers Examples 14.15 and 
14.18. Example 14.17 was suggested by Zubkov (1986). Cases (1) 
and (11) of Example 14.16 are described following Lukacs (1975) 
and Billingsley (1995) respectively. Note that other useful 
counterexamples can be found in works by Neveu (1965), 
Kingman and Taylor (1966), Hettmansperger and Klimko (1974), 
Stout (1974a), Dudley (1976), Ganssler and Stute (1977), Bartlett 
(1978), Rao (1984), Ledoux and Talagrand (1991), Lessi (1993) 
and Shiryaev (1995). 


Section 15. Laws of large numbers 


Example 15.1 (1) and its modifications can be classified as folklore. 
Examples 15.1(11), 15.3 and 15.4 belong to Geller (1978). In 
Example 15.2 we follow the presentations of Lukacs (1975) and 
Bauer (1996). The statement in Example 15.5 1s contained in many 
books: see those by Fisz (1963) or Lukacs (1975). Réveész (1967) is 
the author of Example 15.6. Example 15.7 is based on papers by 
Prohorov (1950) and Fisz (1959). Example 15.8 is due to Hsu and 
Robbins (1947). Taylor and Wei (1979) describe Example 15.9. 
For a presentation of Example 15.10 see Stoyanov et al (1988). For 
the presentation of Example 15.11 we used papers by Jamison et al 
(1965), Chow and Teicher (1971) and Wright et al (1977). The 
classical Example 15.12 is described by Feller (1968). Finally, let 
us note that other counterexamples about the laws of large numbers 
and related topics can be found in works by Prohorov (1959), 
Jamison et al (1965), Lamperti (1966), Révész (1967), Chow and 
Teicher (1971), Feller (1971), Stout (1974a), Wright et al (1977), 
Asmussen and Kurtz (1980), Hall and Heyde (1980), Csorg6 et al 
(1983), Dobric (1987), Ramakrishnan (1988) and Chandra (1989). 


Section 16. Weak convergence of probability measures 
and distributions 


Example 16.1 and other similar examples were originally 
described by Billingsley (1968) and have since appeared in many 
books and lecture notes. Chung (1974) considered Example 16.2 
and its variations can be classified as folklore. Example 16.3, 
suggested by Robbins (1948), is presented in a form similar to that 
in Fisz (1963). Clearly, Example 16.4 belongs to probabilistic 
folklore. The idea of Example 16.5 was suggested by Zolotarev 
(1989). Takahasi (1971/72) is the originator of Example 16.6. The 
idea of Example 16.7 1s indicated by Feller (1971). Example 16.8 
is considered by Kendall and Rao (1950). Example 16.9 is outlined 
by Rohatgi (1976). Example 16.10 is described by Laube (1973). 
Other counterexamples devoted to weak convergence and related 
topics can be found in works by Billingsley (1968, 1995), Breiman 
(1968), Sibley (1971), Borovkov (1972), Roussas (1972), Chung 
(1974), Lukacs (1975) and Eisenberg and Shixin (1983). 


Section 17. Central limit theorem 


Example 17.1 (1) is based on arguments given by Ash (1972) and 
Chow and Teicher (1978). Cases (11) and (111) of the same example 
are considered by Thomasian (1969). Obviously Examples 17.2 
and 17.5 can be classified as folklore. Example 17.3 1s considered 
by Ash (1972). The idea of Example 17.4 1s to be found in Feller 
(1971). Zubkov (1986) suggested Example 17.6. Case (1) of 
Example 17.7 is considered by Gnedenko and Kolmogorov (1954), 
while case (11) is taken from Malisic (1970) and is presented as it 1s 
given by Stoyanov eft al (1988). Additional counterexamples 
concerning the CLT can be found in works by Gnedenko and 
Kolmogorov (1954), Fisz (1963), Rényi (1970), Feller (1971), 
Chung (1974), Landers and Rogge (1977), Laha and Rohatgi 
(1979), Rao (1984), Shevarshidze (1984), Janson (1988) and 


Berkes et al (1991). 


Section 18. Diverse limit theorems 


Example 18.1 (4) 1s considered by Billingsley (1995). Case (11) of 
this example and Example 18.3 are considered by Chow and 
Teicher (1978). Examples 18.2 and 18.4 are covered in many 
sources. Tomkins (1975a) is the author of Example 18.5 and 18.6 
belongs to Neveu (1975). Basterfield (1972) considered Example 
18.7 and noted that this example was suggested by Williams. 
Examples 18.8 and 18.9 are considered by Lukacs (1975). 
Example 18.10 belongs to Arnold (1966), but also see Lukacs 
(1975). Example 18.11 is based on a paper by Stout (1979). 
Example 18.12 1s given by Breiman (1967). Vasudeva (1984) 
treated Example 18.13. Example 18.14 is due to Resnik (1973). A 
great number of other counterexamples concerning the limit 
behaviour of random sequences can be found in the literature. 
However, some of these counterexamples are either very 
specialized or very complicated. The interested reader is referred to 
works by Spitzer (1964), Kendall (1967), Feller (1968, 1971), 
Moran (1968), Sudderth (1971), Roussas (1972), Greenwood 
(1973), Berkes (1974), Chung (1974), Stout (1974a), Kuelbs 
(1976), Hall and Heyde (1980), Serfling (1980), Tomkins (1980), 
Rosalsky and Teicher (1981), Prohorov (1983), Daley and Hall 
(1984), Boss (1985), Kahane (1985), Wittmann (1985), Sato 
(1987), Alonso (1988), Barbour et a/ (1988), Husler (1989), Adler 
(1990), Jensen (1990), Tomkins (1990, 1992, 1996), Hu (1991), 
Ledoux and Talagrand (1991), Rachev (1991), Williams (1991), 
Adler et al (1992), Fu (1993), Rosalsky (1993), Klesov (1995) and 
Rao (1995). 


Section 19. Basic notions on stochastic processes 
Example 19.1 1s based on remarks by Ash and Gardner (1975) and 


Billingsley (1995). Examples 19.2, 19.3, 19.4(4), 19.6(4), 19.7, 19.8 
and 19.104) or modifications of them can be found in many 
textbooks and can be classified as probabilistic folklore. Case (11) 
of Example 19.4 1s considered by Yeh (1973). Example 19.5(1) 1s 
described by Kallianpur (1980), case (11) belongs to Cambanis 
(1975), while case (i111) is given by Dellacherie (1972) and Elliott 
(1982). Example 19.6(11) is due to Masry and Cambanis (1973). 
Example 19.9 is based entirely on a paper by Wang (1982). Cases 
(11) and (111) of Example 19.10 are given in a form similar to that of 
Morrison and Wise (1987). For other counterexamples concerning 
the basic characteristics and properties of stochastic processes we 
refer the reader to works by Dudley (1973), Kallenberg (1973), 
Wentzell (1981), Dellacherie and Meyer (1982), Elliott (1982), 
Metivier (1982), Doob (1984), Hooper and Thorisson (1988), 
Edgar and Sucheston (1992), Rogers and Williams (1994), 
Billingsley (1995) and Rao (1995). 


Section 20. Markov processes 


Examples 20.1(1) and 20.2(111) are probabilistic folklore. Example 
20.1, cases (11) and (111), are due to Feller (1968, 1959). Case (iv) of 
Example 20.1 as well as Example 20.2(1) and (11) are considered by 
Rosenblatt (1971, 1974). Example 20.2(iv) is discussed by 
Freedman (1971). Arguments which are essentially from Isaacson 
and Madsen (1976) are used to describe Example 20.3. According 
to Holewijn and Hordiyk (1975), Example 20.4 was suggested by 
Runnenburg. Example 20.5 is due to Tanny (1974) and O’Brien 
(1982). Speakman (1967) considered Example 20.6. Example 20.7 
is considered by Dynkin (1965) and Wentzell (1981). Example 
20.8(1) is due to A. A. Yushkevich (see Dynkin and Yushkevich 
1956; and also Dynkin 1961; Wentzell 1981). Case (11) of the same 
example is based on an idea from Wong (1971). Example 20.9 is 
considered by Ito (1963). A great number of _ other 
counterexamples (some of them very complicated) can be found in 


the works of Chung (1960, 1982), Dynkin (1961, 1965), Breiman 
(1968), Chung and Walsh (1969), Kurtz (1969), Feller (1971), 
Freedman (1971), Rosenblatt (1971, 1974), Gmnedenko and 
Solovyev (1973), D. P. Johnson (1974), Tweedie (1975), Monrad 
(1976), Lamperti (1977), losifescu (1980), Wentzell (1981), 
Portenko (1982), Salisbury (1986, 1987), Ethier and Kurtz (1986), 
Grey (1989), Liu and Neuts (1991), Revuz and Yor (1991), Alabert 
and Nualart (1992), Ihara (1993), Meyn and Tweedie (1993), 
Courbage and Hamdan (1994), Rogers and Williams (1994), Pakes 
(1995) and Eisenbaum and Kaspi (1995). 


Section 21. Stationary processes and some related 
topics 


Examples 21.1 and 21.2 and other versions of them are 
probabilistic folklore. Example 21.3 1s considered by Ibragimov 
(1962). Example 21.4 is based on arguments by Ibragimov and 
Rozanov (1978). Case (1) of Example 21.5 is discussed by 
Gaposhkin (1973), while case (11) of the same example can be 
found in the paper by Verbitskaya (1966). Example 21.6 can be 
found in more than one source: we follow the presentation given 
by Shiryaev (1995); see also Ash and Gardner (1975). Example 
21.7 is due to Stout (1974b). Cases (1) and (11) of Example 21.8 are 
found in the work of Grenander and Rosenblatt (1957), while case 
(111) 1s discussed by Bradley (1980). For other counterexamples we 
refer the reader to works by Krickeberg (1967), Billingsley (1968), 
Breiman (1969), [bragimov and Linnik (1971), Davydov (1973), 
Rosenblatt (1979), Bradley (1982, 1989), Herrndorf (1984), 
Robertson and Womak (1985), Eberlein and Taqqu (1986), 
Cambanis et al (1987), Dehay (1987a, 1987b), Janson (1988), 
Rieders (1993), Doukhan (1994) and Rosalsky et al (1995). 


Section 22. Discrete-time martingales 


Examples 22.1(1), 22.44) and 22.10 can be classified as 
probabilistic folklore. Example 22.1(1) is given by Neveu (1975), 
while case (11) of the same example was proposed by Kuchler 
(1986). Example 22.2 is based on arguments by Yamazaki (1972). 
Case (1) and case (11) of Example 22.3 are considered respectively 
by Kemeny et a/ (1965) and Freedman (1971). Examples 22.4(11) 
and 22.5(1) were suggested by Melnikov (1983). Tomkins (1975b) 
described Examples 22.5(11) and 22.7. Examples 22.6 and 22.8 can 
be found in Tomkins (1984b) and (1984a) respectively. Zolotarev 
(1961) is the author of Example 22.9, case (1), while case (11) can 
be found in Shiryaev (1984). Example 22.1 1(1) 1s given by Stout 
(1974a) with an indication that it belongs to G. Simons. Case (11) of 
the same example is treated by Neveu (1975), while the general 
possibility presented by case (111) was suggested by Bojdecki 
(1985). Example 22.12(h) was suggested by Marinescu (1985) and 
is given here in the form proposed by Iosifescu (1985). Example 
22.13(1) 1s considered by Edgar and Sucheston (1976a). Example 
22.13(11) is based on Durrett (1991) and suggested by P. 
Chigansky. The last example, 22.14, is based on a paper by Dozzi 
and Imkeller (1990). Other counterexamples concerning the 
properties of discrete-time martingales can be found in works by 
Cuculescu (1970), Nelson (1970), Baez-Duarte (1971), Ash 
(1972), Gilat (1972), Mucci (1973), Austin et al (1974), Stout 
(1974a), Edgar and Sucheston (1976a, 1976b, 1977), Blake (1978, 
1983), Janson (1979), Rao (1979), Alvo et al (1981), Gut and 
Schmidt (1983), Tomkins (1984a, b), Alsmeyer (1990) and Durrett 
(1991). 


Section 23. Continuous-time martingales 


Example 23.1(1) belongs to Doleans-Dade (1971). Case (41) and 
case (111) of the same example are described by Kabanov (1974) 
and Strieker (1986) respectively. According to Kazamaki (1972a), 
Example 23.2 was suggested by P. A. Meyer. Example 23.3 1s 


given by Kazamaki (1972b). Johnson and Helms (1963) have 
given Example 23.4, but here we follow the presentation given by 
Dellacherie and Meyer (1982) and Rao (1979). Case (1) of 
Example 23.5 1s treated by Chung and Williams (1990) (see also 
Revuz and Yor (1991)) while case (11) was suggested to me by Yor 
(1986) (see Karatzas and Shreve (1991)). Example 23.6 1s 
presented by Meyer and Zheng (1984). Example 23.7, considered 
by Radavicius (1980), is an answer to a question posed by B. 
Grigelionis. Example 23.8 belongs to Walsh (1982). Yor (1978) 
has treated topics covering Example 23.9. Example 23.10 was 
communicated to me by Liptser (1985) (see also Liptser and 
Shiryaev (1989)). According to Kallianpur (1980), Example 23.1 
la) belongs to H. Kunita, and the presentation given here is due to 
Yor. Case (11) of the same example is considered by Liptser and 
Shiryaev (1977). Several other counterexamples (some very 
complicated) can be found in works by Dellacherie and Meyer 
(1982), Metivier (1982), Kopp (1984), Liptser and Shiryaev 
(1989), Isaacson (1971), Kazamaki (1974, 1985a), Surgailis 
(1974), Edgar and Sucheston (1976b), Monroe (1976), Sekiguchi 
(1976), Strieker (1977, 1984), Janson (1979), Jeulin and Yor 
(1979), Azema et al (1980), Kurtz (1980), Enchev (1984, 1988), 
Bouleau (1985), Merzbach and Nualart (1985), Williams (1985), 
Ethier and Kurtz (1986), Jacod and Shiryaev (1987), Dudley 
(1989), Revuz and Yor (1991), Yor (1992, 1996), Kazamaki 
(1994) and Pratelli (1994). 


Section 24. Poisson process and Wiener process 


Example 24.1 and its versions can be found in many sources and 
so can be classified as probabilistic folklore. According to 
Goldman (1967), Example 24.2 is due to L. Shepp. We present 
Example 24.3 following the paper of Szasz (1970). Example 24.4 
belongs to Jacod (1975). Hardin (1985) described Example 24.5. 
Example 24.6, cases (1), (41) and (111), was treated by Novikov 


(1972, 1979, 1983) (but see also Liptser and Shiryaev (1977)). 
Example 24.7 was created recently by Novikov (1996). Case (1) of 
Example 24.8 1s considered by Jain and Monrad (1983); for case 
(41) see Dudley (1973) and Fernandez De La Vega (1974). Case 
(11) of the same example is the main result of Wrobel’s work 
(1982). An anonymous enthusiast from Marseille wrote a letter 
describing the idea behind Example 24.9. Example 24.10 belongs 
to Al-Hussaini and Elliott (1989). Several other counterexamples 
can be found in the works of Moran (1967), Thomasian (1969), 
Wang (1977), Novikov (1979), Jain and Monrad (1983), Kazamaki 
and Sekiguchi (1983), Panaretos (1983), Williams (1985), Daley 
and Vere-Jones (1988), Mueller (1988), Huang and Li (1993) and 
Yor (1992, 1996). 

Finally, let us pose one interesting question concerning the 
Wiener process. Suppose X = (X;, t = 0) is a process such that: (1) 


XQ = 0 a.s.; (41) any increment X;— X, with s < tis distributed N(O, 
t — s); (111) any two increments, X;9 — X;1 and X74 — Xt3, where 0 < 
t) < t2 < 3 < ty, are independent. Question: Do these conditions 


imply that XY is a Wiener process? Conjecture: No. (This was 
confirmed. See the paper by Follmer, Wu and Yor (2000) cited in 
the Appendix.) 


Section 25. Diverse properties of stochastic processes 


Example 25.1 belongs to Grinbaum (1972). Case (1) of Example 
25.2 is indicated in the work of Ephremides and Thomas (1974), 
while case (11) of the same example was suggested to me by 
Ivkovic (1985). Example 25.3 1s due to H. Tanaka (see Yamada 
and Watanabe 1971; Zvonkin and Krylov 1981; Durrett 1984). 
Example 25.4 was originally considered by Tsirelson (1975), but 
the proof of the non-existence of the strong solution given here 
belongs to N. V. Krylov (see also Liptser and Shiryaev 1977; 
Kallianpur 1980). For a variety of further counterexamples we 


refer the reader to the following sources: Kadota and Shepp 
(1970), Borovkov (1972), Davies (1973), Cairoli and Walsh 
(1975), Azema and Yor (1978), Rao (1979), Hasminskii (1980), 
Hill et al (1980), Kallianpur (1980), Krylov (1980), Metivier and 
Pellaumail (1980), Chitashvili and Toronjadze (1981), Csorgo and 
Révesz (1981), Follmer (1981), Liptser and Shiryaev (1981, 1982), 
Washburn and Willsky (1981), Kabanov et al (1983), Le Gall and 
Yor (1983), Melnikov (1983), Van der Hoeven (1983), Zaremba 
(1983), Barlow and Perkins (1984), Hoover and Keisler (1984), 
Engelbert and Schmidt (1985), Ethier and Kurtz (1986), Rogers 
and Williams (1987, 1994), Rutkowski (1987), Kichler and 
Sorensen (1989), Maejyima (1989), Anulova (1990), Ihara (1993), 
Schachermayer (1993), Assing and Manthey (1995), Hu and Peérez- 
Abreu (1995) and Rao (1995). 
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APPENDIX 


We use here the same standard terminology, abbreviations and 
notations as in the main body of the book and as generally 
accepted. The Appendix consists of two parts. 

First we give alphabetically Key Words followed by reference 
citations in a chronological order. These are sources where the 
reader can find a counterexample on that topic. 

Then we provide complete bibliographic data of all references. 
Most of the papers and books, with a few exceptions, included in 
the New References, are published after 1996. All references are 
carefully selected from a huge pile of several thousands of 
‘candidates’. Chosen are references containing clearly formulated 
statements or facts which perfectly fall into the category 
counterexamples in probability and stochastic processes. 

There is a huge amount of counterexamples in this area, some 
are available in the literature, others are ‘private possession’ of 
professional researchers and/or teachers in stochastics. Anybody 1s 
welcome to contact me (stoyanovj@gmail.com) with specific 
suggestions, comments, sources, etc. There are ways to make all 
these available to the readers. 
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